Star Hotels Case Study

Problem Statement

A significant number of hotel bookings are called off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.

The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.

The cancellation of bookings impact a hotel on various fronts:

  1. Loss of resources (revenue) when the hotel cannot resell the room.
  2. Additional costs of distribution channels by increasing commissions or paying for publicity to help sell these rooms.
  3. Lowering prices last minute, so the hotel can resell a room, resulting in reducing the profit margin.
  4. Human resources to make arrangements for the guests.

Objective

The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. Star Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations require data-driven solutions. In this report, we analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.

Data Dictionary

The provided dataset contains the following columns:

  1. no_of_adults: Number of adults
  2. no_of_children: Number of Children
  3. no_of_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
  4. no_of_week_nights: Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
  5. type_of_meal_plan: Type of meal plan booked by the customer:
    • Not Selected – No meal plan selected
    • Meal Plan 1 – Breakfast
    • Meal Plan 2 – Half board (breakfast and one other meal)
    • Meal Plan 3 – Full board (breakfast, lunch, and dinner)
  1. required_car_parking_space: Does the customer require a car parking space? (0 - No, 1- Yes)
  2. room_type_reserved: Type of room reserved by the customer. The values are ciphered (encoded) by Star Hotels Group
  3. lead_time: Number of days between the date of booking and the arrival date
  4. arrival_year: Year of arrival date
  5. arrival_month: Month of arrival date
  6. arrival_date: Date of the month
  7. market_segment_type: Market segment designation.
  8. repeated_guest: Is the customer a repeated guest? (0 - No, 1- Yes)
  9. no_of_previous_cancellations: Number of previous bookings that were canceled by the customer prior to the current booking
  10. no_of_previous_bookings_not_canceled: Number of previous bookings not canceled by the customer prior to the current booking
  11. avg_price_per_room: Average price per day of the reservation; prices of the rooms are dynamic. (in euros)
  12. no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room, etc)
  13. booking_status: Flag indicating if the booking was canceled or not.

Importing Nessesory Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pylab
import scipy.stats as stats

#Removes the limit from the number of displayed columns and rows.
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 200)

#Using plotly for specific plots of categorical variables
import plotly.graph_objects as go
import plotly.express as px
from  plotly.subplots import make_subplots
import plotly.io as pio

#Add a nice bachground to graphs and show graghs in the notebood
sns.set(color_codes=True)
%matplotlib inline 

#Function to randomly split the data into train data and test data
from sklearn.model_selection import train_test_split  

#To build logistic regression_model using sklearn
from sklearn.linear_model import LogisticRegression

#To build logistic regression_model using statsmodels
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant

#!pip install -U scikit-learn --user

# Libraries to build decision tree classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

# To tune different decision tree models
from sklearn.model_selection import GridSearchCV

# To get diferent metric scores
from sklearn.metrics import (
    f1_score,
    accuracy_score,
    recall_score,
    precision_score,
    confusion_matrix,
    plot_confusion_matrix,
    make_scorer,
    roc_auc_score,
    precision_recall_curve,
    roc_curve,
)

from sklearn.metrics import confusion_matrix

#To change numeric month to month name
import calendar

# Library to suppress warnings or deprecation notes
import warnings
warnings.filterwarnings("ignore")

Importing Data

In [169]:
#importing DataFrame with the name "used_phone_data.csv"
data=pd.read_csv('C:/Users/Adis/Desktop/Data Science/Project 4- Classification/StarHotelsGroup.csv')
data.head()
Out[169]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations no_of_previous_bookings_not_canceled avg_price_per_room no_of_special_requests booking_status
0 2 0 1 2 Meal Plan 1 0 Room_Type 1 224 2017 10 2 Offline 0 0 0 65.00 0 Not_Canceled
1 2 0 2 3 Not Selected 0 Room_Type 1 5 2018 11 6 Online 0 0 0 106.68 1 Not_Canceled
2 1 0 2 1 Meal Plan 1 0 Room_Type 1 1 2018 2 28 Online 0 0 0 60.00 0 Canceled
3 2 0 0 2 Meal Plan 1 0 Room_Type 1 211 2018 5 20 Online 0 0 0 100.00 0 Canceled
4 3 0 0 3 Not Selected 0 Room_Type 1 277 2019 7 13 Online 0 0 0 89.10 2 Canceled
In [3]:
print(f'There are {data.shape[1]} columns and {data.shape[0]} rows in the data set.')  # f-string
There are 18 columns and 56926 rows in the data set.

Let us take a look at the imported data and the summary of different columns:

In [4]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56926 entries, 0 to 56925
Data columns (total 18 columns):
no_of_adults                            56926 non-null int64
no_of_children                          56926 non-null int64
no_of_weekend_nights                    56926 non-null int64
no_of_week_nights                       56926 non-null int64
type_of_meal_plan                       56926 non-null object
required_car_parking_space              56926 non-null int64
room_type_reserved                      56926 non-null object
lead_time                               56926 non-null int64
arrival_year                            56926 non-null int64
arrival_month                           56926 non-null int64
arrival_date                            56926 non-null int64
market_segment_type                     56926 non-null object
repeated_guest                          56926 non-null int64
no_of_previous_cancellations            56926 non-null int64
no_of_previous_bookings_not_canceled    56926 non-null int64
avg_price_per_room                      56926 non-null float64
no_of_special_requests                  56926 non-null int64
booking_status                          56926 non-null object
dtypes: float64(1), int64(13), object(4)
memory usage: 7.8+ MB

Four of the columns represent categorical variables (qualitative), i.e.:

  • type_of_meal_plan
  • room_type_reserved
  • market_segment_type
  • booking_status

And 14 other columns represent quantitative variables:

  • no_of_adults
  • no_of_children
  • no_of_weekend_nights
  • no_of_week_nights
  • required_car_parking_space
  • lead_time
  • arrival_year
  • arrival_month
  • arrival_date
  • repeated_guest
  • no_of_previous_cancellations
  • no_of_previous_bookings_not_canceled
  • avg_price_per_room
  • no_of_special_requests

Now we check the missing values in the data. Below, number of missing values in any column of the imported data are shown:

In [5]:
data.isnull().sum()
Out[5]:
no_of_adults                            0
no_of_children                          0
no_of_weekend_nights                    0
no_of_week_nights                       0
type_of_meal_plan                       0
required_car_parking_space              0
room_type_reserved                      0
lead_time                               0
arrival_year                            0
arrival_month                           0
arrival_date                            0
market_segment_type                     0
repeated_guest                          0
no_of_previous_cancellations            0
no_of_previous_bookings_not_canceled    0
avg_price_per_room                      0
no_of_special_requests                  0
booking_status                          0
dtype: int64

We can see that there are no missing values in our dataframe.

2. Exploratory Data Analysis (EDA)

  • EDA is an important part of any project involving data.
  • It is important to investigate and understand the data better before building a model with it.
  • A few questions have been mentioned below which will help you approach the analysis in the right manner and generate insights from the data.

Questions:

1. What are the busiest months in the hotel?

First we convert month int to month name.

Next, we tabulate frequencies of each arrival month as below:

In [6]:
# making data fram to show the month name and frequency
df_month=data['arrival_month'].value_counts().to_frame()
df_month.rename(columns={'arrival_month': 'frequencies'},inplace=True)
df_month.reset_index(inplace=True)
df_month = df_month.rename(columns = {'index':'arrival month'})

#we convert month int to month name
df_month['arrival month'] = df_month['arrival month'].apply(lambda x: calendar.month_abbr[x])
df_month
Out[6]:
arrival month frequencies
0 Aug 6402
1 Jun 6238
2 Jul 5870
3 May 5832
4 Apr 5664
5 Oct 5317
6 Mar 4872
7 Sep 4611
8 Feb 3476
9 Dec 3021
10 Nov 2980
11 Jan 2643

Graph below shows frequncy of each moth in descending order.

In [7]:
sns.set_style("whitegrid")

fig = plt.figure(figsize=(15, 4));

# Adds subplot on position 1
fig.add_subplot(121)


# plot the barchart
ax = data['arrival_month'].value_counts().plot(kind="bar", rot=90)
# Make twin axis
ax2 = ax.twinx()

# display counts on each bar
for p in ax.patches:
    ax.annotate('{}'.format(p.get_height()), (p.get_x() -0.1, p.get_height()+10) , fontsize=12, weight='bold')

#adding labels
ax.set(xlabel='arrival month', ylabel='count');

Observations:

  • We can see that the most frequent months are August, June, July, May, and April, respectively.

2. Which market segment do most of the guests come from?

In [8]:
data['market_segment_type'].value_counts()
Out[8]:
Online           39490
Offline          13875
Corporate         2796
Complementary      536
Aviation           229
Name: market_segment_type, dtype: int64
In [9]:
sns.set_style("darkgrid")

fig = plt.figure(figsize=(15, 4));

# Adds subplot on position 1
fig.add_subplot(121)

# plot the barchart
ax = data['market_segment_type'].value_counts().plot(kind="bar", rot=90)

# display counts on each bar
for p in ax.patches:
    ax.annotate('{}'.format(p.get_height()), (p.get_x() -0.1, p.get_height()+10) , fontsize=12, weight='bold')

#adding labels
ax.set(xlabel='market segment type', ylabel='count');

Observations:

  • We can see that the most quests come from Online market segment.

3. Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?

In [10]:
graph=sns.catplot(data=data, x="avg_price_per_room", y="market_segment_type", 
            kind="box", height=4, aspect=2);

#adding labels
graph.set(xlabel='average price per room', ylabel='market segment');

Observations:

  • We can see that the median of the average room prices of the Online market segment is higher compared to the rest of the market segments.
  • Dispersion of average room prices related to the Complementary market segment is lower than other market segments. Which means that room price changes are lower for the Complementary section.
  • In the Online market segment, room prices change the most.
In [11]:
graph=sns.catplot(data=data, x="avg_price_per_room", y="market_segment_type", hue='room_type_reserved',
            kind="box", height=4, aspect=2);

#adding labels
graph.set(xlabel='average price per room', ylabel='market segment');

Price changes are highest for combination of room type 7 with Offline and Corporate market segments.

4. What percentage of bookings are canceled?

In [12]:
print('{}% of bookings are canceled.'.format(round(data['booking_status'].value_counts(normalize=True).mul(100)[1],2)))
37.85% of bookings are canceled.

5. Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?

In [13]:
print('Only {}% of repeating guests cancel their booking. Hence, we can conclude that most of the cancelations are done by non-repeating guests.'.format(round(data[data['repeated_guest']==1]['booking_status'].value_counts(normalize=True).mul(100)[1],2)))
Only 1.28% of repeating guests cancel their booking. Hence, we can conclude that most of the cancelations are done by non-repeating guests.

6. Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?

In [14]:
#plotting
graph=sns.catplot(data=data, y="no_of_special_requests", x="booking_status", 
            kind="violin", height=4, aspect=2);

#adding labels
graph.set(xlabel='booking status', ylabel='number of special requests');
In [15]:
#definig the color
color=sns. set_palette("Set3");

#plotting

graph = sns.FacetGrid(data, col="no_of_special_requests", hue='booking_status', palette=color)
graph.map(sns.histplot, "booking_status" );

#adding labels
graph.set(xlabel='booking status', ylabel='count');
In [16]:
#plotting

graph = sns.FacetGrid(data, col="booking_status", hue='no_of_special_requests')
graph.map(sns.histplot, "no_of_special_requests" );

#adding labels
graph.set(xlabel='number of special requests', ylabel='count');

Observations:

  • We can see that most of the quests do not have any special requests.
  • Very few guests make 3, 4 and 5 special requests for their reservations.
  • Most of the guests who have canceled their resrvations did not had any special request.
  • Guests who have canceled their resrvations had 0, 1, or 2 requests.
  • There has been no cancelations when 3, 4, or 5 special requests were made by a guest.

Summary of Quantitative Variables

In [17]:
data.describe().T
Out[17]:
count mean std min 25% 50% 75% max
no_of_adults 56926.0 1.875856 0.518667 0.0 2.0 2.0 2.0 4.0
no_of_children 56926.0 0.110723 0.408885 0.0 0.0 0.0 0.0 10.0
no_of_weekend_nights 56926.0 0.835840 0.875900 0.0 0.0 1.0 2.0 8.0
no_of_week_nights 56926.0 2.261901 1.432371 0.0 1.0 2.0 3.0 17.0
required_car_parking_space 56926.0 0.026332 0.160123 0.0 0.0 0.0 0.0 1.0
lead_time 56926.0 93.713909 92.408296 0.0 21.0 65.0 142.0 521.0
arrival_year 56926.0 2018.248340 0.644619 2017.0 2018.0 2018.0 2019.0 2019.0
arrival_month 56926.0 6.490215 3.027185 1.0 4.0 6.0 9.0 12.0
arrival_date 56926.0 15.635913 8.718717 1.0 8.0 16.0 23.0 31.0
repeated_guest 56926.0 0.024664 0.155099 0.0 0.0 0.0 0.0 1.0
no_of_previous_cancellations 56926.0 0.020939 0.326142 0.0 0.0 0.0 0.0 13.0
no_of_previous_bookings_not_canceled 56926.0 0.167902 1.943647 0.0 0.0 0.0 0.0 72.0
avg_price_per_room 56926.0 109.610570 38.256075 0.0 85.0 105.0 129.7 540.0
no_of_special_requests 56926.0 0.666040 0.814257 0.0 0.0 0.0 1.0 5.0

Observations:

  • We can see variables required_car_parking_space, repeated_guest have binary values.
  • Dispersion of variable no_of_previous_bookings_not_canceled seems to be very high. This variable ranges from 0 to 72, while its median is 0 which does not seam sensible.
  • Moreover, dispersion of variable avg_price_per_room seems to be high. This variable ranges from 0 to 540, while its median is 105.

Correlations

In [18]:
#plotting heat map

df_corr=data.corr()

fig = plt.figure(figsize=(10, 8));

# color map
cmap = sns.diverging_palette(0, 230, 90, 60, as_cmap=True)

# plot heatmap
ax=sns.heatmap(df_corr,  annot=True, fmt=".2f", linewidths=5, cmap=cmap, vmin=-1, vmax=1, square=True);

plt.title('Correlation heat map for the entire data');

fig.tight_layout()

Observations:

  • The correlations values for quantitative variables are shown above. We can not see a strog positive or strong negative correlations between any of the variables.

Let us check values of each categorical variable:

In [19]:
# looking at value counts for non-numeric features

num_to_display = 12  # defining number of displayed levels for each non-numeric feature
for colname in data.dtypes[data.dtypes == 'object'].index:
    
    val_counts = data[colname].value_counts(dropna=False)  # Show NA counts
    print(f'\n\ncategrical variable= {colname} ') #f-String
    
    if len(val_counts) > num_to_display:
        print(f'Only displaying first {num_to_display} of {len(val_counts)} values.\n') #f-String

    print(val_counts.iloc[:num_to_display])

categrical variable= type_of_meal_plan 
Meal Plan 1     42330
Not Selected    10072
Meal Plan 2      4516
Meal Plan 3         8
Name: type_of_meal_plan, dtype: int64


categrical variable= room_type_reserved 
Room_Type 1    42807
Room_Type 4    10413
Room_Type 6     1581
Room_Type 5      983
Room_Type 2      823
Room_Type 7      312
Room_Type 3        7
Name: room_type_reserved, dtype: int64


categrical variable= market_segment_type 
Online           39490
Offline          13875
Corporate         2796
Complementary      536
Aviation           229
Name: market_segment_type, dtype: int64


categrical variable= booking_status 
Not_Canceled    35378
Canceled        21548
Name: booking_status, dtype: int64

Now, Let us check quantitative variable:

In [20]:
def histogram_boxplot(data, feature, figsize=(10, 5), kde=False, bins=None):
    """
    Boxplot and histogram combined

    data: dataframe
    feature: dataframe column
    figsize: size of figure (default (15,10))
    kde: whether to show the density curve (default False)
    bins: number of bins for histogram (default None)
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(
        nrows=2,  # Number of rows of the subplot grid= 2
        sharex=True,  # x-axis will be shared among all subplots
        gridspec_kw={"height_ratios": (0.25, 0.75)},
        figsize=figsize,
    )  # creating the 2 subplots
    sns.boxplot(
        data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
    )  # boxplot will be created and a star will indicate the mean value of the column
    sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
    ) if bins else sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2
    )  # For histogram
    ax_hist2.axvline(
        data[feature].mean(), color="green", linestyle="--"
    )  # Add mean to the histogram
    ax_hist2.axvline(
        data[feature].median(), color="black", linestyle="-"
    )  # Add median to the histogram
In [21]:
histogram_boxplot(data, "lead_time", bins=70)

Lead time values mostly between 0 and 150 days. There seems to be a lot of outliers in the variable.

In [22]:
histogram_boxplot(data, "no_of_previous_bookings_not_canceled", bins=70)

Most of the quests do not have any previous bookings that has not been canceled. This can be due to the fact that they have never have booked a room in the Stars Hotels.

In [23]:
histogram_boxplot(data, "avg_price_per_room",)

avg_price_per_room seems to have a normal distributions.

In [24]:
# function to create labeled barplots

def labeled_barplot(data, feature, perc=False, n=None):
    """
    Barplot with percentage at the top

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    """

    total = len(data[feature])  # length of the column
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 2, 4))
    else:
        plt.figure(figsize=(n + 2, 4))
    

    plt.xticks(rotation=90, fontsize=12)
    ax = sns.countplot(
        data=data,
        x=feature,
        palette="Paired",
        order=data[feature].value_counts().index[:n].sort_values(),
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  # annotate the percentage

    plt.show()  # show the plot
In [25]:
# function to create labeled barplots

def labeled_barplot_hue(data, feature, hue, perc=False, n=None):
    """
    Barplot with percentage at the top

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    """

    total = len(data[feature])  # length of the column
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 2, 4))
    else:
        plt.figure(figsize=(n + 2, 4))
    

    plt.xticks(rotation=90, fontsize=12)
    ax = sns.countplot(
        data=data,
        x=feature,
        hue=hue,
        palette="husl",
        order=data[feature].value_counts().index[:n].sort_values(),
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  # annotate the percentage

    plt.show()  # show the plot
In [26]:
labeled_barplot(data, "type_of_meal_plan", perc=True)

The most frequent meal plan is Plan 1 with 74.4% frequency. Plan 3 is not desirable for quests.

In [27]:
labeled_barplot_hue(data, "type_of_meal_plan", "booking_status", perc=True)

No apparent connection is detected between meal plans with cancelations for plan 1.

In [28]:
labeled_barplot(data, "room_type_reserved", perc=True)

Room type 1 is the most prefered room by quests.

In [164]:
labeled_barplot_hue(data, "room_type_reserved", "booking_status", perc=True)

Cancallations and room types does not seem to have any pecific correlations.

In [166]:
labeled_barplot_hue(data, "market_segment_type", "booking_status", perc=True)

We devide the list of the numerical values into two groups to draw their pair plot in a way that is better for visualization.

In [178]:
# numerical columns
columns=data.dtypes[data.dtypes != 'object'].index
columns1=columns[0:7]
columns2=columns[7:]
In [181]:
sns.pairplot(data=data[columns1]);
In [182]:
sns.pairplot(data=data[columns2]);
Summary of obsrvations:
  • Variables required_car_parking_space, and repeated_guest are binary.
  • Dispersion of variable no_of_previous_bookings_not_canceled seems to be very high.
  • The correlations values for quantitative variables are shown above. We can not see a strong positive or strong negative correlations between any of the variables.
  • The most frequent months for bookings are August, June, July, May, and April, respectively.
  • Lead time values mostly between 0 and 150 days are distribution is skewed. There seems to be a lot of outliers in the variable.
  • Most of the quests do not have any previous bookings that has not been canceled.
  • avg_price_per_room seems to have a normal distributions.
  • The most frequent meal plan is Plan 1 with 74.4% frequency. Plan 3 is not desirable for quests.
  • No apparent connection is detected between meal plans with cancelations for plan 1.
  • Room type 1 is the most preferred room by quests.
  • There is no apparent dependency between room type and cancelations.
  • Room price changes are highest for the online market section and lowest for the Complementary section.
  • Price changes are highest for combination of room type 7 with Offline and Corporate market segments.
  • Most of the quests do not have any special requests. Guests who have canceled their reservations had 0, 1, or 2 requests.

3. Data Preprocessing

  • Missing value treatment
  • Feature engineering
  • Outlier detection and treatment
  • Preparing data for modeling
  • Any other preprocessing steps

3.1 Missing Value Treatment

In [35]:
data.isnull().sum()
Out[35]:
no_of_adults                            0
no_of_children                          0
no_of_weekend_nights                    0
no_of_week_nights                       0
type_of_meal_plan                       0
required_car_parking_space              0
room_type_reserved                      0
lead_time                               0
arrival_year                            0
arrival_month                           0
arrival_date                            0
market_segment_type                     0
repeated_guest                          0
no_of_previous_cancellations            0
no_of_previous_bookings_not_canceled    0
avg_price_per_room                      0
no_of_special_requests                  0
booking_status                          0
dtype: int64

There are no missing values in te dataframe.

3.2 Duplicate value check

Now, we want to figure out whether we have duplicate data and how to deal with them.

In [183]:
# Count the number of non-duplicates
(~data.duplicated()).sum()

# Count the number of duplicates in rows
data.duplicated().sum()

print('Among the {} rows of the dataframe {} rows are unique. But, {} rows are duplicates.'.format(data.shape[0] , (~data.duplicated()).sum(), data.duplicated().sum()))
Among the 56926 rows of the dataframe 42576 rows are unique. But, 14350 rows are duplicates.

For treating these dulicates, we duplicates except for the first occurrence in the dataframe.

In [184]:
data.drop_duplicates(inplace=True);

print('In the revised dataframe, there are total of {} rows and number of dimplicated rows equals to {}.'.format(data.shape[0] ,  data.duplicated().sum()))
In the revised dataframe, there are total of 42576 rows and number of dimplicated rows equals to 0.

3.3 Feature Engineering

Histogram of the no_of_previous_bookings_not_canceled is shown below.

In [38]:
fig = plt.figure(figsize=(40, 4));

# Adds subplot on position 1
fig.add_subplot(121)


# plot the barchart
ax = data['no_of_previous_bookings_not_canceled'].value_counts().plot(kind="bar", rot=90)
# Make twin axis
ax2 = ax.twinx()

# display counts on each bar
for p in ax.patches:
    ax.annotate('{}'.format(p.get_height()), (p.get_x() -0.1, p.get_height()+10) , fontsize=12, weight='bold')

#adding labels
ax.set(xlabel='no of previous bookings not canceled', ylabel='count');
In [39]:
print('The minimum value of variable "no_of_previous_bookings_not_canceled" is {} and the maximum amount is {}.'.format(data['no_of_previous_bookings_not_canceled'].min(),data['no_of_previous_bookings_not_canceled'].max()))
The minimum value of variable "no_of_previous_bookings_not_canceled" is 0 and the maximum amount is 72.

As we can see, the most frequenct value of no_of_previous_bookings_not_canceled is zero (with frequency equal to 41314). This means that most customers has canceled their prior bookings before the current booking. Furthormore, we see that frequency of values other than 0 are very small for this variable. For instance, the 10 most frequent values are shown below:

In [40]:
data['no_of_previous_bookings_not_canceled'].value_counts().head(10)
Out[40]:
0    41314
1      345
2      163
3      116
4       89
5       85
6       52
7       46
8       34
9       32
Name: no_of_previous_bookings_not_canceled, dtype: int64

Graph below shows frequency of all other values of this variable except for 0.

In [41]:
#definig the color
color=sns. set_palette("dark")

#plotting
fig = plt.figure(figsize=(40, 8));

ax=sns.histplot(x='no_of_previous_bookings_not_canceled', data=data[data['no_of_previous_bookings_not_canceled']>0] , hue='booking_status', palette=color);

We can see than although the values of this feature reaches to 72, the frequencies drop significantly after value 0.

In [42]:
temp=data[data['no_of_previous_bookings_not_canceled']==0]['booking_status'].value_counts()

print('Moreover, when "no_of_previous_bookings_not_canceled"= 0, {} of the current bookings has been canceled and {} has not been canceled.\n'.format(temp[1],temp[0]) )

temp=data[data['no_of_previous_bookings_not_canceled']>0]['booking_status'].value_counts()

print('When "no_of_previous_bookings_not_canceled"> 1, only {} of the current bookings has been canceled and the other {} bookings has not been canceled.\n'.format(temp[1],temp[0]) )

temp=data[data['no_of_previous_bookings_not_canceled']>12]['booking_status'].value_counts()

print('Interestingly, when "no_of_previous_bookings_not_canceled"> 12, none of the {} current bookings has been canceled.'.format(temp[0]) )
Moreover, when "no_of_previous_bookings_not_canceled"= 0, 14482 of the current bookings has been canceled and 26832 has not been canceled.

When "no_of_previous_bookings_not_canceled"> 1, only 5 of the current bookings has been canceled and the other 1257 bookings has not been canceled.

Interestingly, when "no_of_previous_bookings_not_canceled"> 12, none of the 216 current bookings has been canceled.

Hence, it seem a good idea to bin the "no_of_previous_bookings_not_canceled" feature into smaller groups.

Data Binning

We bin the no_of_previous_bookings_not_canceled feature into 3 groups of [0, 1], (1, 12], and (12, 72].

In [185]:
#Binnig the variable into 3 bins
data['Binned_no_of_previous_bookings_not_canceled']=pd.cut(data['no_of_previous_bookings_not_canceled'],bins=[-1,1,12,72],labels=['[0, 1]','(1, 12]','(12, 72]'])

#Dropping the previous variable
data=data.drop(['no_of_previous_bookings_not_canceled'], axis=1)

Frequencies of each bin is displayed below:

In [186]:
data['Binned_no_of_previous_bookings_not_canceled'].value_counts()
Out[186]:
[0, 1]      41659
(1, 12]       701
(12, 72]      216
Name: Binned_no_of_previous_bookings_not_canceled, dtype: int64
In [187]:
#changing type of the variable to "object"
data['Binned_no_of_previous_bookings_not_canceled']=data['Binned_no_of_previous_bookings_not_canceled'].astype('object');

The histogram of the new variable with name Binned_no_of_previous_bookings_not_canceled is as:

In [191]:
#plotting
fig = plt.figure(figsize=(5, 4));
ax=sns.histplot(x='Binned_no_of_previous_bookings_not_canceled', data=data);

We also need to convert type of the Binned_no_of_previous_bookings_not_canceled from categorical to object values as follow:

3.4 Outlier detection and treatment

An outlier is a data point that is distant from other similar points. Linear regression is easily impacted by the outliers in the data. Outliers can distort predictions and affect the accuracy so it's important to flag them for review. This is especially the case with regression models.

Outlier detection using IQR

We use IQR, which is the interval going from the 1st quartile to the 3rd quartile of the data in question, and then flag points for investigation if they are outside 1.5 * IQR.

Let us plot the boxplots of all numerical columns to display outliers.

In [47]:
plt.figure(figsize=(20, 30))

# numerical columns
columns=data.dtypes[data.dtypes != 'object'].index 

# plot
for i, variable in enumerate(columns):
    plt.subplot(5, 4, i + 1)
    plt.boxplot(data[variable], whis=1.5)
    plt.tight_layout()
    plt.title(variable)

Calculating fraction of outliers for each variable:

The following function is used to calculate fracttion of outliers for each numerical columns based on IQR.

In [48]:
#Creating a function to calculate fraction of outliers
def frac_outside_IQR(y):
    x=y.to_numpy(dtype=object)
    length = 1.5 * np.diff(np.quantile(x, [.25, .75]))
    frac=round(np.mean(np.abs(x - np.median(x)) > length),2)
    return frac

#Create list of quantitative variables
numeric_columns=data.dtypes[data.dtypes != 'object'].index.tolist()

print('\nfraction of outliers for quantitative variables:')

#Apply the frac_outside_IQR function on the numeric_columns in data set
Out_frac=data[numeric_columns].apply(frac_outside_IQR, axis=0)
Out_var_list=Out_frac[Out_frac>0]
Out_var_list
fraction of outliers for quantitative variables:
Out[48]:
no_of_adults                    0.27
no_of_children                  0.10
no_of_week_nights               0.02
required_car_parking_space      0.03
lead_time                       0.07
repeated_guest                  0.03
no_of_previous_cancellations    0.01
avg_price_per_room              0.08
no_of_special_requests          0.03
dtype: float64

Before treating the outliers, let us look at values of the numeric features that contain outliers.

In [49]:
num_to_display = 20  # defining number of displayed values for each numeric feature
for colname in Out_var_list.index:
    
    val_counts = data[colname].value_counts(dropna=False)  # Show NA counts
    print(f'\n\nnumerical variable= {colname} ') #f-String
    
    if len(val_counts) > num_to_display:
        print(f'Only displaying first {num_to_display} and last {num_to_display} of {len(val_counts)} values.\n') #f-String
        
    print(val_counts.iloc[: num_to_display])

numerical variable= no_of_adults 
2    31069
1     7264
3     4031
0      184
4       28
Name: no_of_adults, dtype: int64


numerical variable= no_of_children 
0     38300
1      2561
2      1673
3        39
9         2
10        1
Name: no_of_children, dtype: int64


numerical variable= no_of_week_nights 
2     11764
1     10906
3      9660
4      4136
0      2797
5      2505
6       301
7       165
8       121
10       94
9        48
11       20
12       16
15       14
14       10
13        9
16        7
17        3
Name: no_of_week_nights, dtype: int64


numerical variable= required_car_parking_space 
0    41113
1     1463
Name: required_car_parking_space, dtype: int64


numerical variable= lead_time 
Only displaying first 20 and last 20 of 397 values.

0     1597
1     1237
3      764
4      762
2      760
5      661
6      623
7      581
8      536
12     497
9      468
14     440
11     438
10     436
13     408
15     392
18     384
16     379
17     378
28     377
Name: lead_time, dtype: int64


numerical variable= repeated_guest 
0    41261
1     1315
Name: repeated_guest, dtype: int64


numerical variable= no_of_previous_cancellations 
0     42132
1       249
2        66
3        47
11       25
4        24
6        16
5        16
13        1
Name: no_of_previous_cancellations, dtype: int64


numerical variable= avg_price_per_room 
Only displaying first 20 and last 20 of 4939 values.

65.00     687
0.00      641
75.00     603
126.00    528
99.00     496
95.00     492
108.00    447
89.10     446
85.00     446
90.00     407
120.00    400
79.20     391
140.00    376
74.80     374
117.00    344
88.00     338
80.00     325
160.00    312
107.10    309
80.75     289
Name: avg_price_per_room, dtype: int64


numerical variable= no_of_special_requests 
0    19228
1    15571
2     6381
3     1230
4      150
5       16
Name: no_of_special_requests, dtype: int64
  • no_of_adults: We have 184 rows correponding to 0 number of adults which does not make sense. The other values of adults number are 1,2 and 3 which are sensible.
  • no_of_children: We have 2 rows correponding to 9 number of children which is very different from other children numbers (0,1,2,3). Also, there are some rows corresponding to 10 children.
  • required_car_parking_space: required_car_parking_space is a binary variable and we will not need an outlier treatment for it as all its values are eather 1 or 0.
  • repeated_guest: repeated_guest is a binary variable and we will not need an outlier treatment for it as all its values are eather 1 or 0.
  • no_of_previous_cancellations: We have 25 rows correponding to 11 number of previous cancelations which does not make sense. Also, there are some rows corresponding to 13 cancelations. The other values of previous cancelations numbers ranges from 0 to 6 which are sensible.
  • no_of_special_requests: Base on IQR, the fraction of outliers for no_of_special_requests is 3%. However, when we check the values of this value which ranges from 0 to 6 we do not see a need to consider the 0.03 fraction as outliers and hence no treatment will be applied on this variable.
  • no_of_week_nights: The values of the no_of_week_nights belongs to a wide range. We will investigate it further to see if there is any need for outlier treatment.
  • lead_time: The values of the lead_time belongs to a wide range. We will investigate it further to see if there is any need for outlier tratment.
  • avg_price_per_room: The values of the no_of_previous_bookings_not_canceled belongs to a wide range. Hence, we use IQR to treat its outliers.

Outlier Treatment

no_of_adults: We change the number of adults from 0 to 1 to treat the 184 rows that correpond 0 number of adults.

In [192]:
#changing the value of no_of_adults from 0 to 1
data.loc[data[data['no_of_adults']==0].index,'no_of_adults']=1

data['no_of_adults'].value_counts()
Out[192]:
2    31069
1     7448
3     4031
4       28
Name: no_of_adults, dtype: int64

no_of_children: We have 2 rows correponding to 9 number of children which is very different from other children numbers. We assign the corresponding values to the maximum number of children which equals to 3.

In [193]:
#changing the value of outlier no_of_children to 3 child
data.loc[data[data['no_of_children']==9].index,'no_of_children']=3
data.loc[data[data['no_of_children']==10].index,'no_of_children']=3

data['no_of_children'].value_counts()
Out[193]:
0    38300
1     2561
2     1673
3       42
Name: no_of_children, dtype: int64

no_of_previous_cancellations: We change the number of previous cancellations from 11 to 5 to treat the 25 rows that correpond to 11 number of cancellations.

In [194]:
#changing the value of no_of_previous_cancellations from 11,13 to 6
data.loc[data[data['no_of_previous_cancellations']==11].index,'no_of_previous_cancellations']=6
data.loc[data[data['no_of_previous_cancellations']==13].index,'no_of_previous_cancellations']=6

data['no_of_previous_cancellations'].value_counts()
Out[194]:
0    42132
1      249
2       66
3       47
6       42
4       24
5       16
Name: no_of_previous_cancellations, dtype: int64

avg_price_per_room: From the gragh below, we can see that the average price can be a good prediction of booking cancelations. The only values that seem to be outliers are the ones with 0 values and greater than 500 amount.

In [195]:
sns.histplot(data=data, x='avg_price_per_room',hue='booking_status');

There are 641 rows in the data that corresponds with room price=zero which does not make sense. Also there are a few rows that show room price=1. We will substitude these room prices with the median room price.

In [196]:
#changing the value of outlier to the median value
data.loc[data[data['avg_price_per_room']<5].index,'avg_price_per_room']=data['avg_price_per_room'].median()
In [197]:
sns.histplot(data=data, x='avg_price_per_room',hue='booking_status');

no_of_week_nights: The histogram of this varibale for values of "no_of_week_nights">5 is plotted below. Using IQR method some of these points are defined as outliers becuase they fall after the upper whisker. However, it seem that these points can give us valueble information and hence we will not consider them as outliers.

In [56]:
sns.histplot(data=data[data['no_of_week_nights']>5], x='no_of_week_nights',hue='booking_status');

lead_time: We treat outliers of "lead_time" by flooring and capping as follows:

In [198]:
def treat_outliers_func(x):
    """
    treats outliers in a variable
    col: str, name of the numerical variable
    df: dataframe
    col: name of the column
    """
    Q1 = x.quantile(0.25)  # 25th quantile
    Q3 = x.quantile(0.75)  # 75th quantile
    IQR = Q3 - Q1
    Lower_Whisker = Q1 - 1.5 * IQR
    Upper_Whisker = Q3 + 1.5 * IQR

    # all the values smaller than Lower_Whisker will be assigned the value of Lower_Whisker
    # all the values greater than Upper_Whisker will be assigned the value of Upper_Whisker
    x = np.clip(x, Lower_Whisker, Upper_Whisker)

    return x


#Create list of quantitative variables
treating_vars=['lead_time']

#Apply the frac_outside_IQR function on the numeric_columns in data set
data[treating_vars]=data[treating_vars].apply(treat_outliers_func, axis=0)

Boxplots of the revised numeric variables is as below.

In [58]:
plt.figure(figsize=(15, 20))

# numerical columns
columns=['lead_time']

# plot
for i, variable in enumerate(columns):
    plt.subplot(5, 4, i + 1)
    plt.boxplot(data[variable], whis=1.5)
    plt.tight_layout()
    plt.title(variable)

3.5 Other Data Manipulations

We change the values of booking_status which is our target variable and represent it with numeric values. If booking_status=0 it indicates that the current booking has not been canceled and if booking_status=1 it shows the booking has been canceled.

In [199]:
data["booking_status"] = data["booking_status"].apply(lambda x: 1 if x == "Canceled" else 0)

3.6 EDA after Data Manipulation

  • It is a good idea to explore the data once again after manipulating it.
In [200]:
data.head()
Out[200]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights type_of_meal_plan required_car_parking_space room_type_reserved lead_time arrival_year arrival_month arrival_date market_segment_type repeated_guest no_of_previous_cancellations avg_price_per_room no_of_special_requests booking_status Binned_no_of_previous_bookings_not_canceled
0 2 0 1 2 Meal Plan 1 0 Room_Type 1 224 2017 10 2 Offline 0 0 65.00 0 0 [0, 1]
1 2 0 2 3 Not Selected 0 Room_Type 1 5 2018 11 6 Online 0 0 106.68 1 0 [0, 1]
2 1 0 2 1 Meal Plan 1 0 Room_Type 1 1 2018 2 28 Online 0 0 60.00 0 1 [0, 1]
3 2 0 0 2 Meal Plan 1 0 Room_Type 1 211 2018 5 20 Online 0 0 100.00 0 1 [0, 1]
4 3 0 0 3 Not Selected 0 Room_Type 1 271 2019 7 13 Online 0 0 89.10 2 1 [0, 1]
In [201]:
print(f'There are {data.shape[1]} columns and {data.shape[0]} rows in the revised data set.')  # f-string
There are 18 columns and 42576 rows in the revised data set.

Average, median ,standard deviation, min, max, and 1st and 3rd quantiles of quantitative variables are shown below:

In [62]:
data.describe()
Out[62]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights required_car_parking_space lead_time arrival_year arrival_month arrival_date repeated_guest no_of_previous_cancellations avg_price_per_room no_of_special_requests booking_status
count 42576.000000 42576.000000 42576.000000 42576.000000 42576.000000 42576.000000 42576.000000 42576.000000 42576.000000 42576.000000 42576.000000 42576.000000 42576.000000 42576.000000
mean 1.921059 0.141700 0.895270 2.321167 0.034362 75.929843 2018.297891 6.365488 15.682873 0.030886 0.022313 114.036170 0.768109 0.340262
std 0.515768 0.454019 0.887864 1.519328 0.182160 72.739869 0.626126 3.051924 8.813991 0.173011 0.274618 38.364502 0.837264 0.473803
min 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2017.000000 1.000000 1.000000 0.000000 0.000000 6.000000 0.000000 0.000000
25% 2.000000 0.000000 0.000000 1.000000 0.000000 16.000000 2018.000000 4.000000 8.000000 0.000000 0.000000 87.300000 0.000000 0.000000
50% 2.000000 0.000000 1.000000 2.000000 0.000000 53.000000 2018.000000 6.000000 16.000000 0.000000 0.000000 107.000000 1.000000 0.000000
75% 2.000000 0.000000 2.000000 3.000000 0.000000 118.000000 2019.000000 9.000000 23.000000 0.000000 0.000000 135.000000 1.000000 1.000000
max 4.000000 3.000000 8.000000 17.000000 1.000000 271.000000 2019.000000 12.000000 31.000000 1.000000 6.000000 540.000000 5.000000 1.000000

3.7 Splitting Data

One-hot Encodig Categorical Variables

Before we proceed to building models, we'll have to encode categorical features.

In [63]:
#Create list of chategorical variables

object_columns=data.dtypes[data.dtypes == 'object'].index.tolist()

#Creating dummy variables and one-hot encoding for categorical variables
data=pd.get_dummies(data, columns = object_columns, drop_first=True)
data.head()
Out[63]:
no_of_adults no_of_children no_of_weekend_nights no_of_week_nights required_car_parking_space lead_time arrival_year arrival_month arrival_date repeated_guest no_of_previous_cancellations avg_price_per_room no_of_special_requests booking_status type_of_meal_plan_Meal Plan 2 type_of_meal_plan_Meal Plan 3 type_of_meal_plan_Not Selected room_type_reserved_Room_Type 2 room_type_reserved_Room_Type 3 room_type_reserved_Room_Type 4 room_type_reserved_Room_Type 5 room_type_reserved_Room_Type 6 room_type_reserved_Room_Type 7 market_segment_type_Complementary market_segment_type_Corporate market_segment_type_Offline market_segment_type_Online Binned_no_of_previous_bookings_not_canceled_(12, 72] Binned_no_of_previous_bookings_not_canceled_[0, 1]
0 2 0 1 2 0 224 2017 10 2 0 0 65.00 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
1 2 0 2 3 0 5 2018 11 6 0 0 106.68 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1
2 1 0 2 1 0 1 2018 2 28 0 0 60.00 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
3 2 0 0 2 0 211 2018 5 20 0 0 100.00 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
4 3 0 0 3 0 271 2019 7 13 0 0 89.10 2 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1

Defining predictors(x) and target (y) variables

In [64]:
x=data.drop(['booking_status'], axis=1)
y=data[['booking_status']]

Spliting data into train and test datasets

We'll split the data into train and test to be able to evaluate the model that we build on the train data.

In [65]:
# splitting the data in 70:30 ratio for train to test data
x_train1, x_test1, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=1)
In [66]:
print("Number of rows in train data =", x_train1.shape[0]);
print("Number of rows in test data =", x_test1.shape[0]);
Number of rows in train data = 29803
Number of rows in test data = 12773
In [67]:
print("Number (percentage) of bookings in training set:")
print("non-canceled:   {0} ({1:2.2f}%)".format(y_train['booking_status'].value_counts()[1], y_train['booking_status'].value_counts(normalize=True)[1] * 100 ))
print("canceled    :   {0} ({1:2.2f}%)".format(y_train['booking_status'].value_counts()[0], y_train['booking_status'].value_counts(normalize=True)[0] * 100 ))

print("\nNumber (percentage) of bookings in test set:")
print("non-canceled:   {0} ({1:2.2f}%)".format(y_test['booking_status'].value_counts()[1], y_test['booking_status'].value_counts(normalize=True)[1] * 100 ))
print("canceled    :   {0} ({1:2.2f}%)".format(y_test['booking_status'].value_counts()[0], y_test['booking_status'].value_counts(normalize=True)[0] * 100 ))
Number (percentage) of bookings in training set:
non-canceled:   10101 (33.89%)
canceled    :   19702 (66.11%)

Number (percentage) of bookings in test set:
non-canceled:   4386 (34.34%)
canceled    :   8387 (65.66%)

The percentage of canceled and non-canceled bookings in the training and test data sets are almost equal. Hence, both data sets have a good distribution for booking status.

In our model, when variable y=1 it indicates that the room booking is canceled and if y=0 the booking has not been canceled. We aim to build a Logistic Regresion model to make prediction and be able to classify data points. The result of the Regression model is a float number between 0 and 1 that shows predicted probabilty of booking cancelation for each data point. Later, we will define a threshold in to set a threshold for the predicted probabilties and classify data points.

Our model can make wrong predictions as:
  1. Predicting a booking will be canceled (y=1) but in reality it does not (y=0)
  2. Predicting a booking will not be canceled (y=0) but in reality gets canceled (y=1)
Which case is more important?
  • Both the cases are important as:

  • If we predict a booking will be canceled but in reality it does not then the guest will show up at hotel place and demand their reserved room which due to our wrong assumption might not be available. This can make a huge inconveniences for the guest and hotel staff and would damage the hotel reputaion and influence on future revenue.

  • On the contrary, a booking will not be canceled but in reality gets canceled then we will keep the room available for our assumed guest while they don't show up and it constitutes to opportunity loss of revenue.

How to reduce this loss i.e need to reduce False Negatives?
  • f1_score should be maximized, the greater the f1_score higher the chances of identifying both the classes correctly.

First, let's create functions to calculate different performance metrics and confusion matrix

In [68]:
# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification_statsmodels(
    model, predictors, target, threshold
):
    """
    Function to compute different metrics to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """

    # checking which probabilities are greater than threshold
    pred_temp = model.predict(predictors) > threshold
    # rounding off the above values to get classes
    pred = np.round(pred_temp)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred)  # to compute Recall
    precision = precision_score(target, pred)  # to compute Precision
    f1 = f1_score(target, pred)  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
        index=[0],
    )

    return df_perf
In [69]:
# defining a function to plot the confusion_matrix of a classification model

def confusion_matrix_statsmodels(model, predictors, target, threshold):
    """
    To plot the confusion_matrix with percentages

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """
    y_pred = model.predict(predictors)
    y_pred = y_pred.apply(lambda x: 1 if x >= threshold else 0);
    
    cm = confusion_matrix(target, y_pred)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(2, 2)

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")
In [70]:
# defining a function to plot the confusion_matrix of a classification model

def confusion_matrix_statsmodels(model, predictors, target, threshold):
    """
    To plot the confusion_matrix with percentages

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """
    y_pred = model.predict(predictors)
    y_pred = y_pred.apply(lambda x: 1 if x >= threshold else 0);
    
    cm = confusion_matrix(target, y_pred)
    labels = np.asarray(
        [
            ["{0:0.0f}  ".format(item) + "{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(2, 2)

    plt.figure(figsize=(6, 4))
    annot_kws = {"va": 'bottom'}
    sns.heatmap(cm, annot=labels, fmt="", annot_kws=annot_kws)
    plt.ylabel("True label")
    plt.xlabel("Predicted label")
    fig.tight_layout()

4.1 Checking Multicollinearity

  • In order to make statistical inferences from a logistic regression model, it is important to ensure that there is no multicollinearity present in the data.

  • Multicollinearity occurs when predictor variables in a regression model are correlated. This correlation is a problem because predictor variables should be independent. If the correlation between variables is high, it can cause problems when we fit the model and interpret the results. When we have multicollinearity in the linear model, the coefficients that the model suggests are unreliable.

  • There are different ways of detecting (or testing) multicollinearity. One such way is by using the Variance Inflation Factor, or VIF.

  • Variance Inflation Factor (VIF): Variance inflation factors measure the inflation in the variances of the regression parameter estimates due to collinearities that exist among the predictors. It is a measure of how much the variance of the estimated regression coefficient $\beta_k$ is "inflated" by the existence of correlation among the predictor variables in the model.

    • If VIF is 1, then there is no correlation among the $k$th predictor and the remaining predictor variables, and hence, the variance of $\beta_k$ is not inflated at all.
  • General rule of thumb:

    • If VIF is between 1 and 5, then there is low multicollinearity.
    • If VIF is between 5 and 10, we say there is moderate multicollinearity.
    • If VIF is exceeding 10, it shows signs of high multicollinearity.

Usinng the following function, we calculate VIF for each predictor variable

In [71]:
#defining a function to check VIF
def checking_vif(x):
    vif = pd.DataFrame()
    vif["feature"] = x.columns

    # calculating VIF for each feature
    vif["VIF"] = [
        variance_inflation_factor(x.values, i)
        for i in range(len(x.columns))
    ]
    return vif
In [72]:
df_VIF=checking_vif(data)
df_VIF
Out[72]:
feature VIF
0 no_of_adults 21.270397
1 no_of_children 2.338050
2 no_of_weekend_nights 2.186861
3 no_of_week_nights 3.798235
4 required_car_parking_space 1.075546
5 lead_time 3.024091
6 arrival_year 444.238337
7 arrival_month 5.598493
8 arrival_date 4.181171
9 repeated_guest 3.832452
10 no_of_previous_cancellations 1.685980
11 avg_price_per_room 21.457739
12 no_of_special_requests 2.337926
13 booking_status 2.323988
14 type_of_meal_plan_Meal Plan 2 1.149036
15 type_of_meal_plan_Meal Plan 3 1.022610
16 type_of_meal_plan_Not Selected 1.596097
17 room_type_reserved_Room_Type 2 1.111084
18 room_type_reserved_Room_Type 3 1.001038
19 room_type_reserved_Room_Type 4 1.837850
20 room_type_reserved_Room_Type 5 1.148424
21 room_type_reserved_Room_Type 6 2.221981
22 room_type_reserved_Room_Type 7 1.179322
23 market_segment_type_Complementary 3.669181
24 market_segment_type_Corporate 11.284938
25 market_segment_type_Offline 32.322405
26 market_segment_type_Online 186.406590
27 Binned_no_of_previous_bookings_not_canceled_(1... 1.660425
28 Binned_no_of_previous_bookings_not_canceled_[0... 168.138548

Listing variables with VIF greater than 5.

In [73]:
df_VIF[df_VIF['VIF']>5]
Out[73]:
feature VIF
0 no_of_adults 21.270397
6 arrival_year 444.238337
7 arrival_month 5.598493
11 avg_price_per_room 21.457739
24 market_segment_type_Corporate 11.284938
25 market_segment_type_Offline 32.322405
26 market_segment_type_Online 186.406590
28 Binned_no_of_previous_bookings_not_canceled_[0... 168.138548

The VIF values for dummy variables can be ignored. Among the quantitative variables, "no_of_adults", "arrival_year", "arrival_month", "avg_price_per_room", "market_segment_type_Corporate", "market_segment_type_Offline", "market_segment_type_Online", and "Binned_no_of_previous_bookings_notcanceled[0,1]" have VIF greater than 5.

Removing multicollinearity

To remove multicollinearity

  1. Drop every column one by one that has a VIF score greater than 5.
  2. Look at the adjusted R-squared and RMSE of all these models.
  3. Drop the variable that makes the least change in adjusted R-squared.
  4. Check the VIF scores again.
  5. Continue till you get all VIF scores under 5.

Let's first define a function to calculate model performances for the case that each of the the high VIF variables is dropped.

In [74]:
def treating_multicollinearity(predictors, target, high_vif_columns):
    """
    Checking the effect of dropping the columns showing high multicollinearity
    on model performance (adj. R-squared and RMSE)

    predictors: independent variables
    target: dependent variable
    high_vif_columns: columns having high VIF
    """
    # empty lists to store performance measures
    Recall = []
    Accuracy = []
    F1 = []
    Precision=[]

    # build the models by dropping one of the high VIF columns at a time
    # store the performance measures in the lists defined previously
    for cols in high_vif_columns:
        # defining the new train set
        train = predictors.loc[:, ~predictors.columns.str.startswith(cols)]
        
        #Build the model
        lg = LogisticRegression(solver="newton-cg", random_state=1)
        model = lg.fit(predictors, target);

        # create the dataframe including performance measures
        Data_performance = model_performance_classification_statsmodels(model, predictors, target, threshold=0.5)

        # adding Accuracy, Recall, Precision, and F1 to the lists
        Accuracy.append(Data_performance.iloc[0,0])
        Recall.append(Data_performance.iloc[0,1])
        Precision.append(Data_performance.iloc[0,2])
        F1.append(Data_performance.iloc[0,3])
        
       

    # creating a dataframe for the results
    temp = pd.DataFrame(
        {
            "col": high_vif_columns,
            "Accuracy": Accuracy,
            "Recall": Recall,
            "Precision": Precision,
            "F1": F1,
        }
    ).sort_values(by="F1", ascending=False)
    temp.reset_index(drop=True, inplace=True)

    return temp
In [75]:
#List of variables with VIF greater than 5
col_list = df_VIF[df_VIF['VIF']>5]['feature'].tolist()

#Performance measures
Performances = treating_multicollinearity(x_train1, y_train, col_list)
Performances
Out[75]:
col Accuracy Recall Precision F1
0 no_of_adults 0.792638 0.612019 0.732204 0.666739
1 arrival_year 0.792638 0.612019 0.732204 0.666739
2 arrival_month 0.792638 0.612019 0.732204 0.666739
3 avg_price_per_room 0.792638 0.612019 0.732204 0.666739
4 market_segment_type_Corporate 0.792638 0.612019 0.732204 0.666739
5 market_segment_type_Offline 0.792638 0.612019 0.732204 0.666739
6 market_segment_type_Online 0.792638 0.612019 0.732204 0.666739
7 Binned_no_of_previous_bookings_not_canceled_[0... 0.792638 0.612019 0.732204 0.666739

From the table above, we can see that dropping any of the high VIF variables has the same influence on the predictive power of the model. Hence, we drop market_segment_type_Online bacauese it has the greatest VIF value.

In [76]:
col_to_drop = "market_segment_type_Online"

#Dropping the column
x_train2 = x_train1.loc[:, ~x_train1.columns.str.startswith(col_to_drop)]
x_test2 = x_test1.loc[:, ~x_test1.columns.str.startswith(col_to_drop)]

# Check VIF now
df_VIF = checking_vif(x_train2)
print("Variables with VIF>5 after dropping variable", col_to_drop)
df_VIF[df_VIF['VIF']>5]
Variables with VIF>5 after dropping variable market_segment_type_Online
Out[76]:
feature VIF
0 no_of_adults 20.978684
6 arrival_year 208.881329
7 arrival_month 5.622900
11 avg_price_per_room 21.076308
26 Binned_no_of_previous_bookings_not_canceled_[0... 164.404593
In [77]:
df_VIF[df_VIF['VIF']>5].iloc[4,0]
Out[77]:
'Binned_no_of_previous_bookings_not_canceled_[0, 1]'

Quantitative variables "no_of_adults", "arrival_year", "arrival_month", "avg_price_per_room", and "Binned_no_of_previous_bookings_notcanceled" have VIF greater than 5.

In [78]:
#List of variables with VIF greater than 5
col_list = df_VIF[df_VIF['VIF']>5]['feature'].tolist()

#Performance measures
Performances = treating_multicollinearity(x_train2, y_train, col_list)
Performances
Out[78]:
col Accuracy Recall Precision F1
0 no_of_adults 0.792504 0.611128 0.732353 0.666271
1 arrival_year 0.792504 0.611128 0.732353 0.666271
2 arrival_month 0.792504 0.611128 0.732353 0.666271
3 avg_price_per_room 0.792504 0.611128 0.732353 0.666271
4 Binned_no_of_previous_bookings_not_canceled_[0... 0.792504 0.611128 0.732353 0.666271

From the table above, we can see that dropping any of the high VIF variables has the same influence on the predictive power of the model. Hence, we drop Binned_no_of_previous_bookings_notcanceled[0, 1] bacauese it has the greatest VIF value.

In [79]:
col_to_drop = "Binned_no_of_previous_bookings_not_canceled_[0, 1]"
x_train3 = x_train2.loc[:, ~x_train2.columns.str.startswith(col_to_drop)]
x_test3 = x_test2.loc[:, ~x_test2.columns.str.startswith(col_to_drop)]

# Check VIF now
df_VIF= checking_vif(x_train3)
print("Variables with VIF>5 after dropping variable", col_to_drop)
df_VIF[df_VIF['VIF']>5]
Variables with VIF>5 after dropping variable Binned_no_of_previous_bookings_not_canceled_[0, 1]
Out[79]:
feature VIF
0 no_of_adults 20.978602
6 arrival_year 40.323680
7 arrival_month 5.617588
11 avg_price_per_room 21.076296

We have two quantitative variables with VIF greater than 5.

In [80]:
#List of variables with VIF greater than 5
col_list = df_VIF[df_VIF['VIF']>5]['feature'].tolist()

#Performance measures
res = treating_multicollinearity(x_train3, y_train, col_list)
res
Out[80]:
col Accuracy Recall Precision F1
0 no_of_adults 0.792504 0.611227 0.732297 0.666307
1 arrival_year 0.792504 0.611227 0.732297 0.666307
2 arrival_month 0.792504 0.611227 0.732297 0.666307
3 avg_price_per_room 0.792504 0.611227 0.732297 0.666307

We will drop arrival_year.

In [81]:
col_to_drop = "arrival_year"
x_train4 = x_train3.loc[:, ~x_train3.columns.str.startswith(col_to_drop)]
x_test4 = x_test3.loc[:, ~x_test3.columns.str.startswith(col_to_drop)]

# Check VIF now
df_VIF= checking_vif(x_train4)
print("Variables with VIF>5 after dropping variable", col_to_drop)
df_VIF[df_VIF['VIF']>5]
Variables with VIF>5 after dropping variable arrival_year
Out[81]:
feature VIF
0 no_of_adults 15.953988
6 arrival_month 5.274954
10 avg_price_per_room 15.686732

We will no_of_adults and check VIF again.

In [82]:
col_to_drop = "no_of_adults"
x_train5 = x_train4.loc[:, ~x_train4.columns.str.startswith(col_to_drop)]
x_test5 = x_test4.loc[:, ~x_test4.columns.str.startswith(col_to_drop)]

# Check VIF now
df_VIF= checking_vif(x_train5)

print("Variables with VIF>5 after dropping variable", col_to_drop)
df_VIF[df_VIF['VIF']>5]
Variables with VIF>5 after dropping variable no_of_adults
Out[82]:
feature VIF
5 arrival_month 5.227348
9 avg_price_per_room 9.526456

We will avg_price_per_room and check VIF again.

In [83]:
col_to_drop = "avg_price_per_room"
x_train6 = x_train5.loc[:, ~x_train5.columns.str.startswith(col_to_drop)]
x_test6 = x_test5.loc[:, ~x_test5.columns.str.startswith(col_to_drop)]

# Check VIF now
df_VIF= checking_vif(x_train6)
print("VIF after dropping variable", col_to_drop)
df_VIF
VIF after dropping variable avg_price_per_room
Out[83]:
feature VIF
0 no_of_children 2.267860
1 no_of_weekend_nights 2.105835
2 no_of_week_nights 3.320606
3 required_car_parking_space 1.063584
4 lead_time 2.315860
5 arrival_month 4.083858
6 arrival_date 3.283407
7 repeated_guest 1.892913
8 no_of_previous_cancellations 1.637599
9 no_of_special_requests 1.927352
10 type_of_meal_plan_Meal Plan 2 1.084679
11 type_of_meal_plan_Meal Plan 3 1.027440
12 type_of_meal_plan_Not Selected 1.405489
13 room_type_reserved_Room_Type 2 1.098272
14 room_type_reserved_Room_Type 3 1.001259
15 room_type_reserved_Room_Type 4 1.458147
16 room_type_reserved_Room_Type 5 1.043975
17 room_type_reserved_Room_Type 6 2.019465
18 room_type_reserved_Room_Type 7 1.099376
19 market_segment_type_Complementary 1.160410
20 market_segment_type_Corporate 1.595325
21 market_segment_type_Offline 1.288934
22 Binned_no_of_previous_bookings_not_canceled_(1... 1.537779

The used quantitative predictors have no multicollinearity and the assumption is satisfied.

4.2 Building a Logistic Regression model

Since we have dropped multiple columns to eliminate multicolinearity, we need to check again for duplicates in the training set.

In [84]:
print('There are {} number of duplicates in the x_train6 that needs to be treated.'.format(x_train6.duplicated().sum()))
There are 1044 number of duplicates in the x_train6 that needs to be treated.

Dropping the duplicated rows in both x_train and y_train:

In [85]:
# Finding list of the duplicate rows indexes
dup_index=x_train6.loc[x_train6.duplicated(), :].index.tolist()

# droping duplicates from train and test data
x_train7=x_train6.drop(index=dup_index, axis=0)
x_test7=x_test6

# droping duplicates from train data
y_train2=y_train.drop(index=dup_index, axis=0)
y_test2=y_test

print('There are {} number of duplicates in the x_train7.'.format(x_train7.duplicated().sum()))
There are 0 number of duplicates in the x_train7.

Logistic Regression (with Sklearn library)

In [86]:
# There are different solvers available in Sklearn logistic regression
# The newton-cg solver is faster for high-dimensional data

lg = LogisticRegression(solver="newton-cg", random_state=1)
model = lg.fit(x_train7, y_train2)

Logistic Regression (with statsmodels library)

In [87]:
# adding constant
x_train8 = sm.add_constant(x_train7)
x_test8 = sm.add_constant(x_test7)

Our final training and testing sets for X variable are x_train8 and x_test8. Our final training and testing sets for Y variable are y_train2 and y_test2.

In [88]:
# fitting logistic regression model
logit = sm.Logit(y_train2, x_train8.astype(float))
lg = logit.fit(disp=False)

print(lg.summary())
                           Logit Regression Results                           
==============================================================================
Dep. Variable:         booking_status   No. Observations:                28759
Model:                          Logit   Df Residuals:                    28735
Method:                           MLE   Df Model:                           23
Date:                Fri, 17 Sep 2021   Pseudo R-squ.:                  0.3116
Time:                        18:36:41   Log-Likelihood:                -12688.
converged:                      False   LL-Null:                       -18431.
Covariance Type:            nonrobust   LLR p-value:                     0.000
========================================================================================================================
                                                           coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------------------------------------------
const                                                   -1.0100      0.058    -17.341      0.000      -1.124      -0.896
no_of_children                                           0.2710      0.048      5.615      0.000       0.176       0.366
no_of_weekend_nights                                    -0.0074      0.018     -0.415      0.678      -0.043       0.028
no_of_week_nights                                        0.0443      0.011      4.222      0.000       0.024       0.065
required_car_parking_space                              -1.3675      0.115    -11.861      0.000      -1.593      -1.142
lead_time                                                0.0167      0.000     64.688      0.000       0.016       0.017
arrival_month                                           -0.0107      0.005     -1.993      0.046      -0.021      -0.000
arrival_date                                            -0.0030      0.002     -1.721      0.085      -0.006       0.000
repeated_guest                                          -3.3087      0.573     -5.773      0.000      -4.432      -2.185
no_of_previous_cancellations                             0.4696      0.176      2.673      0.008       0.125       0.814
no_of_special_requests                                  -1.2392      0.023    -53.178      0.000      -1.285      -1.193
type_of_meal_plan_Meal Plan 2                            0.3840      0.077      4.975      0.000       0.233       0.535
type_of_meal_plan_Meal Plan 3                            0.2135   1.89e+04   1.13e-05      1.000   -3.71e+04    3.71e+04
type_of_meal_plan_Not Selected                           0.2119      0.041      5.117      0.000       0.131       0.293
room_type_reserved_Room_Type 2                          -0.4477      0.125     -3.594      0.000      -0.692      -0.204
room_type_reserved_Room_Type 3                           0.3677      1.389      0.265      0.791      -2.355       3.091
room_type_reserved_Room_Type 4                           0.3177      0.040      7.913      0.000       0.239       0.396
room_type_reserved_Room_Type 5                           0.6756      0.107      6.313      0.000       0.466       0.885
room_type_reserved_Room_Type 6                           0.5989      0.112      5.360      0.000       0.380       0.818
room_type_reserved_Room_Type 7                           1.0134      0.185      5.486      0.000       0.651       1.375
market_segment_type_Complementary                      -20.3102   2180.240     -0.009      0.993   -4293.502    4252.881
market_segment_type_Corporate                           -1.0286      0.120     -8.568      0.000      -1.264      -0.793
market_segment_type_Offline                             -2.5286      0.063    -39.939      0.000      -2.653      -2.405
Binned_no_of_previous_bookings_not_canceled_(12, 72]   -59.3915   5.99e+12  -9.92e-12      1.000   -1.17e+13    1.17e+13
========================================================================================================================

Observations

Note: Since the multicollinearity is removed from the data the model coefficients and p-values are reliable.

  • Positive values of the coefficient show that that probability of booking cancelation increases with the increase of corresponding attribute value.

  • Negative values of the coefficient shows that probability of booking cancelation decreases with the increase of corresponding attribute value.

  • p-value of a variable indicates if the variable is significant or not. If we consider the significance level to be 0.05 (5%), then any variable with a p-value less than 0.05 would be considered significant. For instance, p-value of "type_of_meal_plan_Meal Plan 3" equals to 1 and hence it is considered as insignificant.

Dropping insignificant variables

We will drop variables one by one by repeatedly doing following:

  • Build a model, check the p-values of the variables, and drop the column with the highest p-value.
    • Create a new model without the dropped feature, check the p-values of the variables, and drop the column with the highest p-value.
    • Repeat the above two steps till there are no columns with p-value > 0.05.
In [89]:
# running a loop to drop variables with high p-value

# initial list of columns
cols = x_train8.columns.tolist()

# setting an initial max p-value
max_p_value = 1

while len(cols) > 0:
    # defining the train set
    x_train_aux = x_train8[cols]

    # fitting the model
    model = sm.Logit(y_train2, x_train_aux).fit(disp=False)

    # getting the p-values and the maximum p-value
    p_values = model.pvalues
    max_p_value = max(p_values)

    # name of the variable with maximum p-value
    feature_with_p_max = p_values.idxmax()

    if max_p_value > 0.05:
        cols.remove(feature_with_p_max)
    else:
        break

selected_features = cols
print(selected_features)
['const', 'no_of_children', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'arrival_month', 'repeated_guest', 'no_of_previous_cancellations', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Corporate', 'market_segment_type_Offline']

We extract the data correspondent to the selected features defined above.

In [90]:
x_train9 = x_train8[selected_features]
x_test9 = x_test8[selected_features]

We rebuild regression model using statmodels.

In [91]:
logit = sm.Logit(y_train2, x_train9.astype(float))
lg2 = logit.fit(disp=False)

print(lg2.summary())
                           Logit Regression Results                           
==============================================================================
Dep. Variable:         booking_status   No. Observations:                28759
Model:                          Logit   Df Residuals:                    28741
Method:                           MLE   Df Model:                           17
Date:                Fri, 17 Sep 2021   Pseudo R-squ.:                  0.3081
Time:                        18:36:44   Log-Likelihood:                -12752.
converged:                       True   LL-Null:                       -18431.
Covariance Type:            nonrobust   LLR p-value:                     0.000
==================================================================================================
                                     coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------------
const                             -1.0884      0.050    -21.647      0.000      -1.187      -0.990
no_of_children                     0.2744      0.048      5.733      0.000       0.181       0.368
no_of_week_nights                  0.0476      0.010      4.666      0.000       0.028       0.068
required_car_parking_space        -1.3634      0.115    -11.844      0.000      -1.589      -1.138
lead_time                          0.0168      0.000     65.262      0.000       0.016       0.017
arrival_month                     -0.0132      0.005     -2.465      0.014      -0.024      -0.003
repeated_guest                    -3.5789      0.579     -6.186      0.000      -4.713      -2.445
no_of_previous_cancellations       0.4060      0.179      2.262      0.024       0.054       0.758
no_of_special_requests            -1.2335      0.023    -53.057      0.000      -1.279      -1.188
type_of_meal_plan_Meal Plan 2      0.3855      0.077      4.999      0.000       0.234       0.537
type_of_meal_plan_Not Selected     0.2340      0.041      5.663      0.000       0.153       0.315
room_type_reserved_Room_Type 2    -0.4553      0.124     -3.673      0.000      -0.698      -0.212
room_type_reserved_Room_Type 4     0.3304      0.040      8.255      0.000       0.252       0.409
room_type_reserved_Room_Type 5     0.6616      0.106      6.230      0.000       0.453       0.870
room_type_reserved_Room_Type 6     0.6062      0.111      5.463      0.000       0.389       0.824
room_type_reserved_Room_Type 7     0.9361      0.180      5.205      0.000       0.584       1.289
market_segment_type_Corporate     -0.9939      0.120     -8.306      0.000      -1.228      -0.759
market_segment_type_Offline       -2.5115      0.063    -39.683      0.000      -2.636      -2.387
==================================================================================================

Now the p-value of none of the features is greater than 0.05, so we'll consider the features in x_train9 as the final features for classification and and lg2 as our final model.

4.3 Model Performance Evaluation

In [92]:
print("Training performance:")
train_sklearn =model_performance_classification_statsmodels(lg2, x_train9, y_train2, threshold=0.5)
train_sklearn
Training performance:
Out[92]:
Accuracy Recall Precision F1
0 0.783859 0.589704 0.723052 0.649605
In [93]:
# creating confusion matrix
confusion_matrix_statsmodels(lg2, x_train9, y_train2, threshold=0.5)
In [94]:
print("Test performance:")
test_sklearn= model_performance_classification_statsmodels(lg2, x_test9, y_test2, threshold=0.5)
test_sklearn
Test performance:
Out[94]:
Accuracy Recall Precision F1
0 0.780553 0.586639 0.722144 0.647377
In [95]:
# creating confusion matrix
confusion_matrix_statsmodels(lg2, x_test9, y_test2, threshold=0.5)

ROC-AUC Gragh

  • ROC-AUC is plotted on both the training set and the test set as below:
In [96]:
logit_roc_auc_train = roc_auc_score(y_train2, lg2.predict(x_train9))
fpr, tpr, thresholds = roc_curve(y_train2, lg2.predict(x_train9))

plt.figure(figsize=(7, 5))

plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver Operating Characteristic- Training Set")
plt.legend(loc="lower right")
plt.show()
In [97]:
logit_roc_auc_test = roc_auc_score(y_test2, lg2.predict(x_test9))
fpr, tpr, thresholds = roc_curve(y_test2, lg2.predict(x_test9))

plt.figure(figsize=(7, 5))

plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_test)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver Operating Characteristic- Test Set")
plt.legend(loc="lower right")
plt.show()
  • Our Logistic Regression model is giving a good performance on both training and test sets.

Model Performance Improvement

  • Let's see if the f1 score can be improved further, by changing the model threshold using AUC-ROC Curve.

Optimal threshold using AUC-ROC curve

In [98]:
# Optimal threshold as per AUC-ROC curve
# The optimal cut off would be where TPR is high and FPR is low
fpr, tpr, thresholds = roc_curve(y_train2, lg2.predict(x_train9))

optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)
0.3314785710330322
In [99]:
print("\nTraining performance with optimum threshold:")
train_threshold_roc_train= model_performance_classification_statsmodels(lg2, x_train9, y_train2, threshold=optimal_threshold_auc_roc)
train_threshold_roc_train
Training performance with optimum threshold:
Out[99]:
Accuracy Recall Precision F1
0 0.771967 0.783338 0.632823 0.700082
In [100]:
# creating confusion matrix
confusion_matrix_statsmodels(lg2,  x_train9, y_train2, threshold=optimal_threshold_auc_roc)
In [101]:
print("\nTest performance with optimum threshold:")
test_threshold_roc_train= model_performance_classification_statsmodels(lg2, x_test9, y_test2, threshold=optimal_threshold_auc_roc)
test_threshold_roc_train
Test performance with optimum threshold:
Out[101]:
Accuracy Recall Precision F1
0 0.771314 0.781122 0.635975 0.701115
In [102]:
# creating confusion matrix
confusion_matrix_statsmodels(lg2,  x_test9, y_test2, threshold=optimal_threshold_auc_roc)
Observations:
  • Precision of model has decreased. Also, Accuracy is slightly decreased. However, we can see that Recall and F1 have increased.
  • Performance metrices of training data and test data almost are equal. This indicates that the Logestic Regression model has been able to present data without the noise.
  • Overal, the model is giving a good performance.

Let's use Precision-Recall Curve and see if we can find a better threshold.

In [103]:
# Function to plot Precision-Recall Curve 
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
    plt.plot(thresholds, precisions[:-1], "b--", label="precision")
    plt.plot(thresholds, recalls[:-1], "g--", label="recall")
    plt.xlabel("Threshold")
    plt.legend(loc="upper left")
    plt.ylim([0, 1])
In [104]:
#predicting y
y_scores = lg2.predict(x_train9)

#Calculating Precision and Recall for different thresholds
prec, rec, tre = precision_recall_curve(y_train2, y_scores,)

#Plotting the Precision_Recall Curve
plt.figure(figsize=(10, 7))
plt.title("Precision-Recall Curve- Training Set")
plot_prec_recall_vs_tresh(prec, rec, tre)
In [105]:
#predicting y
y_scores_test = lg2.predict(x_test9)

#Calculating Precision and Recall for different thresholds
prec, rec, tre = precision_recall_curve(y_test2, y_scores_test,)

#Plotting the Precision_Recall Curve
plt.figure(figsize=(10, 7))
plt.title("Precision-Recall Curve- Training Set")
plot_prec_recall_vs_tresh(prec, rec, tre)
  • At the threshold of 0.42, we get balanced recall and precision on both the training and test data.
In [106]:
# setting the threshold
optimal_threshold_curve = 0.42
In [107]:
print("\nTraining performance with optimum threshold:")
train_threshold_curve_train= model_performance_classification_statsmodels(lg2, x_train9, y_train2, threshold=optimal_threshold_curve)
train_threshold_curve_train
Training performance with optimum threshold:
Out[107]:
Accuracy Recall Precision F1
0 0.785806 0.682325 0.685694 0.684005
In [108]:
# creating confusion matrix
confusion_matrix_statsmodels(lg2, x_train9, y_train2, threshold=optimal_threshold_curve)
In [109]:
print("\nTest performance with optimum threshold:")
test_threshold_curve_train= model_performance_classification_statsmodels(lg2, x_test9, y_test2, threshold=optimal_threshold_curve)
test_threshold_curve_train
Test performance with optimum threshold:
Out[109]:
Accuracy Recall Precision F1
0 0.78525 0.678979 0.690471 0.684676
  • Model is performing well on both training and test set and Recal, Precision, and F1 metrices are balanced.
In [110]:
# creating confusion matrix
confusion_matrix_statsmodels(lg2, x_test9, y_test2, threshold=optimal_threshold_curve)

Model Performance Summary

In [111]:
# training performance comparison

models_train_comp_df = pd.concat(
    [
        train_sklearn.T,
        test_sklearn.T,
        train_threshold_roc_train.T,
        test_threshold_roc_train.T,
        train_threshold_curve_train.T,
        test_threshold_curve_train.T,
    ],
    axis=1,
)
models_train_comp_df.columns = [
    "Lg model on training set- Threshold= Default",
    "Lg model on test set- Threshold= Default",
    "Lg model on training set- Threshold= 0.33",
    "Lg model on test set- Threshold= 0.33",
    "Lg model on training set- Threshold= 0.42",
    "Lg model on test set- Threshold= 0.42",
]

print("\nTraining performance comparison:")
models_train_comp_df.T
Training performance comparison:
Out[111]:
Accuracy Recall Precision F1
Lg model on training set- Threshold= Default 0.783859 0.589704 0.723052 0.649605
Lg model on test set- Threshold= Default 0.780553 0.586639 0.722144 0.647377
Lg model on training set- Threshold= 0.33 0.771967 0.783338 0.632823 0.700082
Lg model on test set- Threshold= 0.33 0.771314 0.781122 0.635975 0.701115
Lg model on training set- Threshold= 0.42 0.785806 0.682325 0.685694 0.684005
Lg model on test set- Threshold= 0.42 0.785250 0.678979 0.690471 0.684676
Observations:
  • Values of the peformance measures on the test data are almost the same as the values of the peformance measures on the training data. This indicates that the model appropriately eliminates noise.
  • Accuracy of the model is almost the same for the defualt amount of threshold as well as the thresholds equal to 0.33 and 0.42.
  • By increasing the value of threshold, precision increases and recall and f1 decrease.
  • When threshold equals to 0.33, f1 is at its highest value.
  • When threshold is set to 0.42, we obtain a balance between precision and recall values.

4.4 Final Model Summary

Features in x_train9 is considered as the final train set and lg2 is our final model.

In [112]:
logit = sm.Logit(y_train2, x_train9.astype(float))
lg2 = logit.fit(disp=False)

print(lg2.summary())
                           Logit Regression Results                           
==============================================================================
Dep. Variable:         booking_status   No. Observations:                28759
Model:                          Logit   Df Residuals:                    28741
Method:                           MLE   Df Model:                           17
Date:                Fri, 17 Sep 2021   Pseudo R-squ.:                  0.3081
Time:                        18:36:51   Log-Likelihood:                -12752.
converged:                       True   LL-Null:                       -18431.
Covariance Type:            nonrobust   LLR p-value:                     0.000
==================================================================================================
                                     coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------------
const                             -1.0884      0.050    -21.647      0.000      -1.187      -0.990
no_of_children                     0.2744      0.048      5.733      0.000       0.181       0.368
no_of_week_nights                  0.0476      0.010      4.666      0.000       0.028       0.068
required_car_parking_space        -1.3634      0.115    -11.844      0.000      -1.589      -1.138
lead_time                          0.0168      0.000     65.262      0.000       0.016       0.017
arrival_month                     -0.0132      0.005     -2.465      0.014      -0.024      -0.003
repeated_guest                    -3.5789      0.579     -6.186      0.000      -4.713      -2.445
no_of_previous_cancellations       0.4060      0.179      2.262      0.024       0.054       0.758
no_of_special_requests            -1.2335      0.023    -53.057      0.000      -1.279      -1.188
type_of_meal_plan_Meal Plan 2      0.3855      0.077      4.999      0.000       0.234       0.537
type_of_meal_plan_Not Selected     0.2340      0.041      5.663      0.000       0.153       0.315
room_type_reserved_Room_Type 2    -0.4553      0.124     -3.673      0.000      -0.698      -0.212
room_type_reserved_Room_Type 4     0.3304      0.040      8.255      0.000       0.252       0.409
room_type_reserved_Room_Type 5     0.6616      0.106      6.230      0.000       0.453       0.870
room_type_reserved_Room_Type 6     0.6062      0.111      5.463      0.000       0.389       0.824
room_type_reserved_Room_Type 7     0.9361      0.180      5.205      0.000       0.584       1.289
market_segment_type_Corporate     -0.9939      0.120     -8.306      0.000      -1.228      -0.759
market_segment_type_Offline       -2.5115      0.063    -39.683      0.000      -2.636      -2.387
==================================================================================================

Now, we can move towards the prediction part.

In [113]:
# predictions on the test set
pred = lg2.predict(x_test9)

df_pred_test = pd.DataFrame({"Actual": y_test2['booking_status'], "Predicted (probabiity)": pred})
df_pred_test.head(10)
Out[113]:
Actual Predicted (probabiity)
50934 0 0.111886
42266 0 0.100541
34994 0 0.097136
14547 1 0.514953
32158 0 0.048607
53837 0 0.423543
14845 0 0.475726
5656 1 0.248464
35144 1 0.833112
15586 1 0.942546
  • In the above table, Predicted values indicate probability of booking cancelation which is shown by value 1.

  • For instance, in the first row we can see that the actual value of y is 0 which indicates that the booking has not been canceled. Our predicted value for y is 0.11 which indicates that based on the proposed Logistic Regression model the probability of booking cancelation is predicted to be 11%.

We can also visualize and compare the actual values and the predicted probability of occurance as bar graph below:

In [114]:
df1 = df_pred_test.head(25)
df1.plot(kind="bar", figsize=(15, 7));

Now, we apply threshold=0.42 to see the results of predictions:

In [115]:
df_pred_test["Predicted Value"] = df_pred_test["Predicted (probabiity)"].apply(lambda x: 1 if x >= 0.42 else 0)
df_pred_test
Out[115]:
Actual Predicted (probabiity) Predicted Value
50934 0 0.111886 0
42266 0 0.100541 0
34994 0 0.097136 0
14547 1 0.514953 1
32158 0 0.048607 0
... ... ... ...
37411 1 0.669139 1
55236 1 0.449811 1
18959 1 0.568832 1
4160 1 0.340234 0
27641 1 0.869162 1

12773 rows × 3 columns

In [116]:
df1 = df_pred_test.head(25).drop('Predicted (probabiity)',axis=1)
df1.plot(kind="bar", figsize=(15, 7));

5. Classification- Decision Tree

5.1 Building a Decision Tree model

  • We build our model using the DecisionTreeClassifier function. Using default 'gini' criteria to split.
  • If the frequency of class A is 10% and the frequency of class B is 90%, then class B will become the dominant class and the decision tree will become biased toward the dominant classes.

  • In this case, we can pass a dictionary {0:0.15,1:0.85} to the model to specify the weight of each class and the decision tree will give more weightage to class 1.

  • class_weight is a hyperparameter for the decision tree classifier.

In [117]:
model = DecisionTreeClassifier(
    criterion="gini", class_weight={0: 0.15, 1: 0.85}, random_state=1)
In [118]:
model.fit(x_train9, y_train2);
In [119]:
# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification_decision_tree(
    model, predictors, target
):
    """
    Function to compute different metrics to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """

    # checking which probabilities are greater than threshold
    pred = model.predict(predictors)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred)  # to compute Recall
    precision = precision_score(target, pred)  # to compute Precision
    f1 = f1_score(target, pred)  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
        index=[0],
    )

    return df_perf
In [120]:
def confusion_matrix_sklearn(model, predictors, target):
    """
    To plot the confusion_matrix with percentages

    model: classifier
    predictors: independent variables
    target: dependent variable
    """
    y_pred = model.predict(predictors)
    cm = confusion_matrix(target, y_pred)
    labels = np.asarray(
        [
            ["{0:0.0f}  ".format(item) + "{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(2, 2)

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")

Checking model performance on training set

In [121]:
confusion_matrix_sklearn(model, x_train9, y_train2)
plt.title('confusion matrix for training data');
In [122]:
decision_tree_perf_train1 = model_performance_classification_decision_tree(model, x_train9, y_train2)
decision_tree_perf_train1
Out[122]:
Accuracy Recall Precision F1
0 0.963803 0.999284 0.904242 0.94939
  • Model is able to perfectly classify most of the data points on the training set.
  • We have less than 1 percent errors on the training set which means that most of the entries has been classified correctly.

Checking model performance on test set

In [123]:
confusion_matrix_sklearn(model, x_test9, y_test2)
plt.title('confusion matrix for test data');
In [124]:
decision_tree_perf_test1 = model_performance_classification_decision_tree(model, x_test9, y_test2)
decision_tree_perf_test1
Out[124]:
Accuracy Recall Precision F1
0 0.760197 0.676471 0.643461 0.659553
  • We can see that Accuracy, Recall, Precision, and F1 measures on the test data have all decresed compared to the training data.
  • There is a huge disparity in performance of model on training set and test set, which suggests that the model is overfiiting.
  • As we know a decision tree will continue to grow and classify each data point correctly if no restrictions are applied as the trees will learn all the patterns in the training set.
  • This generally leads to overfitting of the model as Decision Tree will perform well on the training set but will fail to replicate the performance on the test set.

5.2 Prunning the Tree

Since the model is highly overfitting, we need to prune the tree to eliminate capturing the noise of the training data in the model.

Pre_Pruning using GridSearch for Tuning Hyperparameters of the Model

  • Hyperparameter tuning is also tricky in the sense that there is no direct way to calculate how a change in the hyperparameter value will reduce the loss of our model, hence we resort to experimentation. i.e we'll use Grid search
  • Grid search is a tuning technique that attempts to compute the optimum values of hyperparameters.
  • It is an exhaustive search that is performed on specific parameter values of a model.
  • The parameters of the estimator/model used to apply these methods are optimized by cross-validated grid-search over a parameter grid.
In [125]:
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1, class_weight={0: 0.15, 1: 0.85})

# Grid of parameters to choose from
parameters = {
    "max_depth": [5, 10, 20, None],
    "criterion": ["entropy", "gini"],
    "splitter": ["best", "random"],
    "min_impurity_decrease": [0.00004, 0.0001, 0.01],
}

# Type of scoring used to compare parameter combinations
scorer = make_scorer(f1_score)

# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=scorer, cv=5);
grid_obj = grid_obj.fit(x_train9, y_train2)

# Set the clf to the best combination of parameters
model_2 = grid_obj.best_estimator_

# Fit the best algorithm to the data.
model_2.fit(x_train9, y_train2);

Checking model performance on training set

Confusion matrix for training set is:

In [126]:
confusion_matrix_sklearn(model_2, x_train9, y_train2)
plt.title('confusion matrix for training data');
In [127]:
decision_tree_perf_train2 = model_performance_classification_decision_tree(model_2, x_train9, y_train2)
decision_tree_perf_train2
Out[127]:
Accuracy Recall Precision F1
0 0.747279 0.981373 0.575052 0.725176
  • Model is able to perfectly classify most of the data points on the training set.
  • We have less than 1 percent errors on the training set which means that most of the entries has been classified correctly.

Checking model performance on test set

In [128]:
confusion_matrix_sklearn(model_2, x_test9, y_test2)
plt.title('confusion matrix for test data');
In [129]:
decision_tree_perf_test2 = model_performance_classification_decision_tree(model_2, x_test9, y_test2)
decision_tree_perf_test2
Out[129]:
Accuracy Recall Precision F1
0 0.717373 0.931829 0.552447 0.693652
  • We can see that Accuracy, Recall, Precision, and F1 measures on the test data have all decresed compared to the training data and again the model is overfiiting.

Visualizing the Decision Tree

In [130]:
# creating a list of column names
feature_names = x_train9.columns.to_list()
In [131]:
# plotting the decision tree
plt.figure(figsize=(20, 30))
out = tree.plot_tree(
    model_2,
    feature_names=feature_names,
    filled=True,
    fontsize=9,
    node_ids=False,
    class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor("black")
        arrow.set_linewidth(1)
plt.show()
In [132]:
# Text report showing the rules of a decision tree -

print(tree.export_text(model_2, feature_names=feature_names, show_weights=True))
|--- lead_time <= 25.50
|   |--- no_of_special_requests <= 1.50
|   |   |--- lead_time <= 5.50
|   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |--- weights: [26.55, 0.00] class: 0
|   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |--- lead_time <= 2.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 7 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 7 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 2.55] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 >  0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.95, 9.35] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  2.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.35, 12.75] class: 1
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.80, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |--- weights: [6.00, 0.00] class: 0
|   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |--- weights: [10.50, 1.70] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |--- weights: [9.60, 0.00] class: 0
|   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |--- lead_time <= 4.50
|   |   |   |   |   |   |--- no_of_week_nights <= 10.50
|   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |--- weights: [12.60, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [111.45, 28.05] class: 0
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [8.25, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |--- weights: [10.35, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  10.50
|   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- lead_time >  4.50
|   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.35, 0.00] class: 0
|   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |--- weights: [1.65, 0.00] class: 0
|   |   |   |--- repeated_guest >  0.50
|   |   |   |   |--- weights: [72.45, 0.00] class: 0
|   |   |--- lead_time >  5.50
|   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |--- lead_time <= 24.50
|   |   |   |   |   |   |--- weights: [49.50, 0.00] class: 0
|   |   |   |   |   |--- lead_time >  24.50
|   |   |   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |   |   |--- market_segment_type_Corporate <= 0.50
|   |   |   |   |   |   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.25, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.85] class: 0
|   |   |   |   |   |   |   |   |--- repeated_guest >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- market_segment_type_Corporate >  0.50
|   |   |   |   |   |   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 19.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 12.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  12.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- lead_time >  19.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.25, 6.80] class: 1
|   |   |   |   |   |   |   |   |--- repeated_guest >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [10.95, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [3.45, 19.55] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 10.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  10.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |--- repeated_guest >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [8.10, 0.00] class: 0
|   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |--- weights: [11.85, 0.00] class: 0
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- lead_time <= 24.50
|   |   |   |   |   |   |   |--- weights: [44.40, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  24.50
|   |   |   |   |   |   |   |--- weights: [1.05, 1.70] class: 1
|   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |--- lead_time <= 23.50
|   |   |   |   |   |   |   |--- weights: [28.20, 2.55] class: 0
|   |   |   |   |   |   |--- lead_time >  23.50
|   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [3.30, 0.85] class: 0
|   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.30, 0.85] class: 1
|   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |--- lead_time <= 8.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  8.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.60, 3.40] class: 1
|   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |--- lead_time <= 20.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 18.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- lead_time >  18.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.20, 3.40] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  20.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |   |   |--- weights: [6.30, 0.00] class: 0
|   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |--- weights: [29.25, 0.85] class: 0
|   |--- no_of_special_requests >  1.50
|   |   |--- no_of_week_nights <= 4.50
|   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |--- weights: [218.40, 0.00] class: 0
|   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |--- lead_time <= 8.50
|   |   |   |   |   |--- weights: [7.95, 0.00] class: 0
|   |   |   |   |--- lead_time >  8.50
|   |   |   |   |   |--- lead_time <= 11.50
|   |   |   |   |   |   |--- weights: [1.35, 2.55] class: 1
|   |   |   |   |   |--- lead_time >  11.50
|   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |--- weights: [6.90, 0.00] class: 0
|   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |--- lead_time <= 17.50
|   |   |   |   |   |   |   |   |--- weights: [0.45, 1.70] class: 1
|   |   |   |   |   |   |   |--- lead_time >  17.50
|   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |--- no_of_week_nights >  4.50
|   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |--- weights: [1.50, 0.00] class: 0
|   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |--- lead_time <= 18.50
|   |   |   |   |   |   |--- lead_time <= 3.50
|   |   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  3.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- weights: [2.85, 9.35] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |--- lead_time >  18.50
|   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |--- weights: [2.10, 0.00] class: 0
|--- lead_time >  25.50
|   |--- lead_time <= 150.50
|   |   |--- no_of_special_requests <= 0.50
|   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |--- market_segment_type_Corporate <= 0.50
|   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 <= 0.50
|   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |--- weights: [61.50, 442.00] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |--- weights: [186.30, 1841.10] class: 1
|   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 >  0.50
|   |   |   |   |   |   |   |--- weights: [4.95, 202.30] class: 1
|   |   |   |   |   |--- market_segment_type_Corporate >  0.50
|   |   |   |   |   |   |--- lead_time <= 44.50
|   |   |   |   |   |   |   |--- lead_time <= 26.50
|   |   |   |   |   |   |   |   |--- weights: [1.20, 2.55] class: 1
|   |   |   |   |   |   |   |--- lead_time >  26.50
|   |   |   |   |   |   |   |   |--- lead_time <= 33.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  33.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 38.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 3.40] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  38.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.25, 0.85] class: 0
|   |   |   |   |   |   |--- lead_time >  44.50
|   |   |   |   |   |   |   |--- lead_time <= 70.00
|   |   |   |   |   |   |   |   |--- arrival_month <= 10.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.90, 9.35] class: 1
|   |   |   |   |   |   |   |   |--- arrival_month >  10.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  70.00
|   |   |   |   |   |   |   |   |--- lead_time <= 86.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  86.50
|   |   |   |   |   |   |   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 114.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 4.25] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  114.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- repeated_guest >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |--- room_type_reserved_Room_Type 7 <= 0.50
|   |   |   |   |   |   |--- weights: [13.50, 0.85] class: 0
|   |   |   |   |   |--- room_type_reserved_Room_Type 7 >  0.50
|   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |--- lead_time <= 87.50
|   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |--- lead_time <= 64.00
|   |   |   |   |   |   |   |   |   |--- lead_time <= 35.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 1.70] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  35.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  64.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 5.10] class: 1
|   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 27.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.65, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  27.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 79.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  79.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.65, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [2.85, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |--- lead_time <= 30.50
|   |   |   |   |   |   |   |   |--- weights: [10.35, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  30.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 7 <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 57.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 40.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  40.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [22.95, 2.55] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  57.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 73.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  73.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 7 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |--- weights: [28.05, 0.85] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |--- lead_time <= 47.50
|   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  47.50
|   |   |   |   |   |   |   |   |--- weights: [0.30, 0.85] class: 1
|   |   |   |   |--- lead_time >  87.50
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 111.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 92.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  92.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 10.20] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  111.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 13
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 3.40] class: 1
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.85, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  10.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- weights: [4.80, 0.00] class: 0
|   |   |--- no_of_special_requests >  0.50
|   |   |   |--- no_of_special_requests <= 1.50
|   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |--- lead_time <= 99.00
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 9.00
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [20.70, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.35, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  9.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |   |--- lead_time >  99.00
|   |   |   |   |   |   |   |   |--- weights: [2.25, 14.45] class: 1
|   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 128.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 17
|   |   |   |   |   |   |   |   |   |--- lead_time >  128.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 6.80] class: 1
|   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 14
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 16.15] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.25, 25.50] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- lead_time <= 100.00
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 93.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [27.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  93.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 2.55] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  100.00
|   |   |   |   |   |   |   |   |   |--- weights: [2.55, 24.65] class: 1
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- weights: [23.10, 0.00] class: 0
|   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |--- lead_time <= 91.50
|   |   |   |   |   |   |--- no_of_week_nights <= 7.50
|   |   |   |   |   |   |   |--- weights: [41.25, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  7.50
|   |   |   |   |   |   |   |--- weights: [0.15, 0.85] class: 1
|   |   |   |   |   |--- lead_time >  91.50
|   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 94.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  94.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [5.40, 9.35] class: 1
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 2.55] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.85, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 122.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  122.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |--- no_of_special_requests >  1.50
|   |   |   |   |--- lead_time <= 90.50
|   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |--- lead_time <= 89.50
|   |   |   |   |   |   |   |--- weights: [217.05, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  89.50
|   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.15, 0.85] class: 1
|   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |   |--- lead_time <= 82.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 71.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 30.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  30.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [3.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  71.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 1.70] class: 1
|   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 7.65] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 32.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  32.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.40, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  82.50
|   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |   |--- weights: [10.80, 0.00] class: 0
|   |   |   |   |--- lead_time >  90.50
|   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |   |   |   |--- weights: [14.70, 56.95] class: 1
|   |   |   |   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 6.80] class: 1
|   |   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 115.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  115.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  7.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 1.70] class: 1
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.60, 0.00] class: 0
|   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |--- weights: [4.05, 0.85] class: 0
|   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |--- weights: [25.65, 0.00] class: 0
|   |--- lead_time >  150.50
|   |   |--- no_of_special_requests <= 2.50
|   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- weights: [2.70, 1328.55] class: 1
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 <= 0.50
|   |   |   |   |   |   |   |   |--- market_segment_type_Corporate <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.20, 38.25] class: 1
|   |   |   |   |   |   |   |   |--- market_segment_type_Corporate >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |--- lead_time <= 268.50
|   |   |   |   |   |   |   |   |--- lead_time <= 199.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.65, 1.70] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  199.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.50, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  268.50
|   |   |   |   |   |   |   |   |--- weights: [0.30, 3.40] class: 1
|   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |--- weights: [36.30, 1206.15] class: 1
|   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [11.40, 136.00] class: 1
|   |   |   |   |   |   |   |--- repeated_guest >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |--- weights: [6.75, 114.75] class: 1
|   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |--- lead_time <= 176.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 173.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [4.35, 8.50] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  173.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.65, 16.15] class: 1
|   |   |   |   |   |   |   |--- lead_time >  176.50
|   |   |   |   |   |   |   |   |--- weights: [11.70, 113.90] class: 1
|   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |--- lead_time <= 242.00
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 235.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [18.15, 46.75] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  235.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |--- weights: [7.80, 31.45] class: 1
|   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 219.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  219.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.60, 18.70] class: 1
|   |   |   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- weights: [2.25, 0.00] class: 0
|   |   |   |   |--- lead_time >  242.00
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- weights: [7.80, 95.20] class: 1
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- weights: [1.35, 0.00] class: 0
|   |   |--- no_of_special_requests >  2.50
|   |   |   |--- weights: [31.65, 0.00] class: 0

In [133]:
# Displaying important features in tree
importances = model_2.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
  • According to the decision tree model, lead time is the most important variable for predicting a booking cancelation.

Post_Pruning using Cost Complexity

The DecisionTreeClassifier provides parameters such as min_samples_leaf and max_depth to prevent a tree from overfiting. Cost complexity pruning provides another option to control the size of a tree. In DecisionTreeClassifier, this pruning technique is parameterized by the cost complexity parameter, ccp_alpha. Greater values of ccp_alpha increase the number of nodes pruned. Here we only show the effect of ccp_alpha on regularizing the trees and how to choose a ccp_alpha based on validation scores.

Total impurity of leaves vs effective alphas of pruned tree


Minimal cost complexity pruning recursively finds the node with the "weakest link". The weakest link is characterized by an effective alpha, where the nodes with the smallest effective alpha are pruned first. To get an idea of what values of ccp_alpha could be appropriate, scikit-learn provides DecisionTreeClassifier.cost_complexity_pruning_path that returns the effective alphas and the corresponding total leaf impurities at each step of the pruning process.

In [134]:
model_3 = DecisionTreeClassifier(random_state=1, class_weight={0: 0.15, 1: 0.85})
path = model_3.cost_complexity_pruning_path(x_train9, y_train2)
ccp_alphas, impurities = path.ccp_alphas, path.impurities

pd.DataFrame(path).head()
Out[134]:
ccp_alphas impurities
0 0.000000e+00 0.022912
1 -9.486769e-20 0.022912
2 -4.065758e-20 0.022912
3 -4.065758e-20 0.022912
4 -4.065758e-20 0.022912
In [135]:
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
Observation:

As alpha increases, more of the tree is pruned, which increases the total impurity of its leaves. The last value in ccp_alphas is the alpha value that prunes the whole tree, leaving the tree, with one node. Hence, we remove it.

In [136]:
#removig 
ccp_alphas = ccp_alphas[:-1]

ccp_alphas=np.delete(ccp_alphas, np.where(ccp_alphas < 0))

Next, we train a decision tree using the effective alphas.

In [137]:
models_Pool = []
for ccp_alpha in ccp_alphas:
    model = DecisionTreeClassifier(
        random_state=1, ccp_alpha=ccp_alpha, class_weight={0: 0.15, 1: 0.85}
    )
    model.fit(x_train9, y_train2)
    models_Pool.append(model)
print(
    "Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
        models_Pool[-1].tree_.node_count, ccp_alphas[-1]
    )
)
Number of nodes in the last tree is: 3 with ccp_alpha: 0.017272867266456854

We show that the number of nodes and tree depth decreases as alpha increases.

In [138]:
node_counts = [model.tree_.node_count for model in models_Pool]
depth = [model.tree_.max_depth for model in models_Pool]

#ploting
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()

Calculating Recall and Precision for trainig data:

In [139]:
recall_train = []
precision_train=[]

for model in models_Pool:
    pred_train = model.predict(x_train9)
    values_train1 = recall_score(y_train2, pred_train)# to compute Recall
    values_train2 = precision_score(y_train2, pred_train)# to compute Precision
    recall_train.append(values_train1)
    precision_train.append(values_train2)

Calculating Recall and Precision for test data:

In [140]:
recall_test = []
precision_test=[]

for model in models_Pool:
    pred_test = model.predict(x_test9)
    values_test1 = recall_score(y_test2, pred_test)# to compute Recall
    values_test2 = precision_score(y_test2, pred_test)# to compute Precision
    recall_test.append(values_test1)
    precision_test.append(values_test2)
In [141]:
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(
    ccp_alphas, recall_train, marker="o", label="recall_train", drawstyle="steps-post",
)
ax.plot(ccp_alphas, recall_test, marker="o", label="recall_test", drawstyle="steps-post")
ax.legend()
plt.show()

When alpha increases, at first Recall increases but as alpha passes 0.009 we see a sharp drop in the Recall values.

In [142]:
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Precision")
ax.set_title("Precision vs alpha for training and testing sets")
ax.plot(
    ccp_alphas, precision_train, marker="o", label="precision_train", drawstyle="steps-post",
)
ax.plot(ccp_alphas, precision_test, marker="o", label="precision_test", drawstyle="steps-post")
ax.legend()
plt.show()

As alpha increases, the Precision value drops significantly.

Regarding the values of both Recall and Precision, it seems that appropriate amount of alpha should be defined by maximum Precision amount. This is due to the fact that the for a wide range of alpha values Recall remains in an appropriate level.

Next, we find a model with the best amount of alpha:

In [143]:
# Finding the model where we get highest Precision on the test set
index_best_model = np.argmax(precision_test)
model_3 = models_Pool[index_best_model]
print(model_3)
DecisionTreeClassifier(ccp_alpha=8.333080516396911e-06,
                       class_weight={0: 0.15, 1: 0.85}, random_state=1)
In [144]:
model_3.fit(x_train9, y_train2);

Checking model performance on training set

Confusion matrix for training set is:

In [145]:
confusion_matrix_sklearn(model_3, x_train9, y_train2)
plt.title('confusion matrix for training data');
In [146]:
decision_tree_perf_train3 = model_performance_classification_decision_tree(model_3, x_train9, y_train2)
decision_tree_perf_train3
Out[146]:
Accuracy Recall Precision F1
0 0.962829 0.999079 0.902051 0.948089
  • Model is able to perfectly classify most of the data points on the training set.
  • We have less than 1 percent errors on the training set which means that most of the entries has been classified correctly.
  • Amount of Precision and Recall are almost balanced.

Checking model performance on test set

In [147]:
confusion_matrix_sklearn(model_3, x_test9, y_test2)
plt.title('confusion matrix for test data');
In [148]:
decision_tree_perf_test3 = model_performance_classification_decision_tree(model_3, x_test9, y_test2)
decision_tree_perf_test3
Out[148]:
Accuracy Recall Precision F1
0 0.76145 0.678979 0.645008 0.661557
  • We can see that Accuracy, Recall, Precision, and F1 measures on the test data have all decresed compared to the training data which indicates amounts of model overfiiting.

Visualizing the Post_Prunned Decision Tree

In [149]:
# creating a list of column names
feature_names = x_train9.columns.to_list()
In [150]:
# plotting the decision tree
plt.figure(figsize=(20, 30))
out = tree.plot_tree(
    model_3,
    feature_names=feature_names,
    filled=True,
    fontsize=9,
    node_ids=False,
    class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor("black")
        arrow.set_linewidth(1)
plt.show()
In [151]:
# Text report showing the rules of a decision tree -

print(tree.export_text(model_3, feature_names=feature_names, show_weights=True))
|--- lead_time <= 25.50
|   |--- no_of_special_requests <= 1.50
|   |   |--- lead_time <= 5.50
|   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |--- weights: [26.55, 0.00] class: 0
|   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |--- lead_time <= 2.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 7 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 15
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 7 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 >  0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  2.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 17
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.80, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |--- weights: [6.00, 0.00] class: 0
|   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [7.65, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |--- weights: [9.60, 0.00] class: 0
|   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |--- lead_time <= 4.50
|   |   |   |   |   |   |--- no_of_week_nights <= 10.50
|   |   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |   |--- weights: [12.60, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 17
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [8.25, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |--- weights: [10.35, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  10.50
|   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- lead_time >  4.50
|   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.35, 0.00] class: 0
|   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |--- weights: [1.65, 0.00] class: 0
|   |   |   |--- repeated_guest >  0.50
|   |   |   |   |--- weights: [72.45, 0.00] class: 0
|   |   |--- lead_time >  5.50
|   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |--- lead_time <= 24.50
|   |   |   |   |   |   |--- weights: [49.50, 0.00] class: 0
|   |   |   |   |   |--- lead_time >  24.50
|   |   |   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |   |   |--- market_segment_type_Corporate <= 0.50
|   |   |   |   |   |   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 20
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.25, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |--- repeated_guest >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- market_segment_type_Corporate >  0.50
|   |   |   |   |   |   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 19.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 12.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  12.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |--- lead_time >  19.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |   |   |--- repeated_guest >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [10.95, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 24
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 10.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  10.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 13
|   |   |   |   |   |   |   |   |--- repeated_guest >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [8.10, 0.00] class: 0
|   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |--- weights: [11.85, 0.00] class: 0
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- lead_time <= 24.50
|   |   |   |   |   |   |   |--- weights: [44.40, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  24.50
|   |   |   |   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |--- lead_time <= 23.50
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [15.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 11.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.80, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  11.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 18.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 17.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  17.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  18.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 10.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  10.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  23.50
|   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |--- lead_time <= 24.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  24.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.50, 0.00] class: 0
|   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |--- lead_time <= 8.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  8.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 21.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 3.40] class: 1
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  21.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |--- lead_time <= 20.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 18.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- lead_time >  18.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |--- lead_time >  20.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |   |   |--- weights: [6.30, 0.00] class: 0
|   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |--- lead_time <= 10.50
|   |   |   |   |   |   |--- lead_time <= 9.50
|   |   |   |   |   |   |   |--- weights: [4.80, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  9.50
|   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.85] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |--- lead_time >  10.50
|   |   |   |   |   |   |--- weights: [22.80, 0.00] class: 0
|   |--- no_of_special_requests >  1.50
|   |   |--- no_of_week_nights <= 4.50
|   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |--- weights: [218.40, 0.00] class: 0
|   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |--- lead_time <= 8.50
|   |   |   |   |   |--- weights: [7.95, 0.00] class: 0
|   |   |   |   |--- lead_time >  8.50
|   |   |   |   |   |--- lead_time <= 11.50
|   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 9.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  9.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |--- lead_time >  11.50
|   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |--- weights: [6.90, 0.00] class: 0
|   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |--- lead_time <= 17.50
|   |   |   |   |   |   |   |   |--- lead_time <= 15.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  15.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 16.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  16.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  17.50
|   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |--- no_of_week_nights >  4.50
|   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |--- weights: [1.50, 0.00] class: 0
|   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |--- lead_time <= 18.50
|   |   |   |   |   |   |--- lead_time <= 3.50
|   |   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  3.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 6.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  6.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 7 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 7 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |--- lead_time >  18.50
|   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |--- weights: [2.10, 0.00] class: 0
|--- lead_time >  25.50
|   |--- lead_time <= 150.50
|   |   |--- no_of_special_requests <= 0.50
|   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |--- market_segment_type_Corporate <= 0.50
|   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 <= 0.50
|   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 27.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  27.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 22
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 42.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  42.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 79.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 44.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 14
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  44.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 18
|   |   |   |   |   |   |   |   |   |--- lead_time >  79.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 13
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 26
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 16
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 29
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 40.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 13
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |--- lead_time >  40.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 22
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 18
|   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 >  0.50
|   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |--- no_of_children <= 1.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_children >  1.00
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 96.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  96.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.55] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |--- lead_time <= 124.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |--- lead_time >  124.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 138.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  138.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 126.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  126.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |--- market_segment_type_Corporate >  0.50
|   |   |   |   |   |   |--- lead_time <= 44.50
|   |   |   |   |   |   |   |--- lead_time <= 26.50
|   |   |   |   |   |   |   |   |--- no_of_previous_cancellations <= 0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  7.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- no_of_previous_cancellations >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  26.50
|   |   |   |   |   |   |   |   |--- lead_time <= 33.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  33.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 38.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- lead_time >  38.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 40.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  40.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |--- lead_time >  44.50
|   |   |   |   |   |   |   |--- lead_time <= 70.00
|   |   |   |   |   |   |   |   |--- arrival_month <= 10.00
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_previous_cancellations <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_previous_cancellations >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.40] class: 1
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |--- arrival_month >  10.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  70.00
|   |   |   |   |   |   |   |   |--- lead_time <= 86.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  86.50
|   |   |   |   |   |   |   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 114.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  114.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- repeated_guest >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |--- room_type_reserved_Room_Type 7 <= 0.50
|   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |--- weights: [9.90, 0.00] class: 0
|   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |--- weights: [2.40, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |--- room_type_reserved_Room_Type 7 >  0.50
|   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |--- lead_time <= 87.50
|   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |--- lead_time <= 64.00
|   |   |   |   |   |   |   |   |   |--- lead_time <= 35.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  35.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  64.00
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 74.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  74.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.40] class: 1
|   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 27.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.65, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  27.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 79.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 16
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  79.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.65, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [2.85, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |--- lead_time <= 30.50
|   |   |   |   |   |   |   |   |--- weights: [10.35, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  30.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 7 <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 57.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 40.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 13
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  40.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |--- lead_time >  57.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 73.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 12
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  73.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 7 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |--- lead_time <= 60.00
|   |   |   |   |   |   |   |   |   |--- weights: [3.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  60.00
|   |   |   |   |   |   |   |   |   |--- lead_time <= 63.00
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  63.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.40, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |--- weights: [21.30, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |--- lead_time <= 47.50
|   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  47.50
|   |   |   |   |   |   |   |   |--- lead_time <= 53.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  53.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |--- lead_time >  87.50
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 148.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 24
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  148.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 111.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 92.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  92.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |--- lead_time >  111.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 18
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.85, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [3.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- arrival_month >  10.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- weights: [4.80, 0.00] class: 0
|   |   |--- no_of_special_requests >  0.50
|   |   |   |--- no_of_special_requests <= 1.50
|   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |--- lead_time <= 99.00
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 9.00
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [20.70, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.35, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  9.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |   |--- lead_time >  99.00
|   |   |   |   |   |   |   |   |--- lead_time <= 121.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 117.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.55] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  117.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  121.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 145.00
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 131.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  131.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 5.95] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  145.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 128.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 18
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 26
|   |   |   |   |   |   |   |   |   |--- lead_time >  128.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 16
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 36
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |--- lead_time <= 100.00
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 93.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [27.00, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  93.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 52.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  52.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  100.00
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 6.00
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 143.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  143.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  6.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |--- weights: [23.10, 0.00] class: 0
|   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |--- lead_time <= 91.50
|   |   |   |   |   |   |--- no_of_week_nights <= 7.50
|   |   |   |   |   |   |   |--- weights: [41.25, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  7.50
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- lead_time >  91.50
|   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 94.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  94.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 18
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.55] class: 1
|   |   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.85, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 122.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  122.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |--- no_of_special_requests >  1.50
|   |   |   |   |--- lead_time <= 90.50
|   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |--- lead_time <= 89.50
|   |   |   |   |   |   |   |--- weights: [217.05, 0.00] class: 0
|   |   |   |   |   |   |--- lead_time >  89.50
|   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |   |--- lead_time <= 82.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 71.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 30.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  30.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [3.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  71.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 18
|   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 32.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  32.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.40, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  82.50
|   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |   |--- weights: [10.80, 0.00] class: 0
|   |   |   |   |--- lead_time >  90.50
|   |   |   |   |   |--- no_of_special_requests <= 2.50
|   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 100.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  100.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 129.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 13
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  129.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 22
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 115.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 17
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  115.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 13
|   |   |   |   |   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 2.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 128.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 106.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  106.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- lead_time >  128.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  2.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.60, 0.00] class: 0
|   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |--- lead_time <= 126.50
|   |   |   |   |   |   |   |   |--- weights: [2.55, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  126.50
|   |   |   |   |   |   |   |   |--- lead_time <= 128.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  128.00
|   |   |   |   |   |   |   |   |   |--- weights: [1.50, 0.00] class: 0
|   |   |   |   |   |--- no_of_special_requests >  2.50
|   |   |   |   |   |   |--- weights: [25.65, 0.00] class: 0
|   |--- lead_time >  150.50
|   |   |--- no_of_special_requests <= 2.50
|   |   |   |--- market_segment_type_Offline <= 0.50
|   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- arrival_month <= 1.50
|   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 200.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 185.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.40] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  185.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  200.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 11.05] class: 1
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |--- arrival_month >  1.50
|   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |--- market_segment_type_Corporate <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1060.80] class: 1
|   |   |   |   |   |   |   |   |--- market_segment_type_Corporate >  0.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 5.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.40] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  5.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 175.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.55] class: 1
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  175.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 255.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  255.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 <= 0.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 <= 0.50
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 158.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  158.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 5 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |--- lead_time <= 268.50
|   |   |   |   |   |   |   |   |--- lead_time <= 199.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 197.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 170.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  170.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  197.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- lead_time >  199.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.50, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  268.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.55] class: 1
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.85] class: 1
|   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |--- type_of_meal_plan_Not Selected <= 0.50
|   |   |   |   |   |   |--- arrival_month <= 9.50
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 198.50
|   |   |   |   |   |   |   |   |   |   |--- market_segment_type_Corporate <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 21
|   |   |   |   |   |   |   |   |   |   |--- market_segment_type_Corporate >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  198.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 14
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 6 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 5.95] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_special_requests <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- no_of_special_requests >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 115.60] class: 1
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 215.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  215.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 17.85] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_special_requests <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |   |--- no_of_special_requests >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |--- arrival_month >  9.50
|   |   |   |   |   |   |   |--- repeated_guest <= 0.50
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 <= 0.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 19
|   |   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |--- room_type_reserved_Room_Type 4 >  0.50
|   |   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |--- repeated_guest >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |--- type_of_meal_plan_Not Selected >  0.50
|   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |--- lead_time <= 153.50
|   |   |   |   |   |   |   |   |--- lead_time <= 152.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 3.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  3.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  152.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- lead_time >  153.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 157.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  157.00
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 230.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_special_requests <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- no_of_special_requests >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |--- lead_time >  230.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 232.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  232.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |--- lead_time <= 176.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 7.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 173.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 152.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  152.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |--- lead_time >  173.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- arrival_month >  7.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 162.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  162.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |--- lead_time >  176.50
|   |   |   |   |   |   |   |   |--- lead_time <= 220.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 6.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 219.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  219.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  6.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- lead_time >  220.50
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- required_car_parking_space <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- required_car_parking_space >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 6.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  6.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.40] class: 1
|   |   |   |--- market_segment_type_Offline >  0.50
|   |   |   |   |--- lead_time <= 242.00
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 <= 0.50
|   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 235.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 20
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  235.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 18
|   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 184.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 179.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  179.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- lead_time >  184.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 190.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  190.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 15
|   |   |   |   |   |   |--- type_of_meal_plan_Meal Plan 2 >  0.50
|   |   |   |   |   |   |   |--- no_of_week_nights <= 5.50
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 219.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  219.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 8.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  8.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |   |   |--- arrival_month <= 10.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 231.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  231.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- arrival_month >  10.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- no_of_week_nights >  5.50
|   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- weights: [2.25, 0.00] class: 0
|   |   |   |   |--- lead_time >  242.00
|   |   |   |   |   |--- arrival_month <= 11.50
|   |   |   |   |   |   |--- no_of_week_nights <= 1.50
|   |   |   |   |   |   |   |--- arrival_month <= 9.00
|   |   |   |   |   |   |   |   |--- arrival_month <= 6.50
|   |   |   |   |   |   |   |   |   |--- no_of_special_requests <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- lead_time <= 263.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- lead_time >  263.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 14.45] class: 1
|   |   |   |   |   |   |   |   |   |--- no_of_special_requests >  0.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 3.40] class: 1
|   |   |   |   |   |   |   |   |--- arrival_month >  6.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 12.75] class: 1
|   |   |   |   |   |   |   |--- arrival_month >  9.00
|   |   |   |   |   |   |   |   |--- no_of_week_nights <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 4.25] class: 1
|   |   |   |   |   |   |   |   |--- no_of_week_nights >  0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 266.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- lead_time >  266.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |--- no_of_week_nights >  1.50
|   |   |   |   |   |   |   |--- arrival_month <= 3.50
|   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- arrival_month >  3.50
|   |   |   |   |   |   |   |   |--- no_of_children <= 0.50
|   |   |   |   |   |   |   |   |   |--- lead_time <= 270.50
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- no_of_week_nights >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- lead_time >  270.50
|   |   |   |   |   |   |   |   |   |   |--- arrival_month <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- arrival_month >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |--- no_of_children >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |--- arrival_month >  11.50
|   |   |   |   |   |   |--- weights: [1.35, 0.00] class: 0
|   |   |--- no_of_special_requests >  2.50
|   |   |   |--- weights: [31.65, 0.00] class: 0

In [152]:
# Displaying important features in tree
importances = model_3.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
  • According to the decision tree model, lead_time is the most important variable for predicting a booking cancelation. Next to the lead_time, the important variables are no_of_special_requests, arrival_month, no_of_week_nights, and market_segment_type_office

5.3 Model Performance Comparison and Conclusions

Performance Summary for the Full , Pre_Prunned, and Post Prunned Tree:

In [153]:
# training performance comparison

DT_models_train_comp_df = pd.concat(
    [
        decision_tree_perf_train1.T,
        decision_tree_perf_train2.T,
        decision_tree_perf_train3.T,
        decision_tree_perf_test3.T,
        decision_tree_perf_test2.T,
        decision_tree_perf_test3.T,
    ],
    axis=1,
)
DT_models_train_comp_df.columns = [
    "Full_Tree_Data_Training",
    "Pre_Prunned_Data_Training",
    "Post_Prunned_Data_Training",
    "Full_Tree_Data_Test",
    "Pre_Prunned_Data_Test",
    "Post_Prunned_Data_Test",
]

print("\nperformance comparison of all three decision tree models:")
DT_models_train_comp_df.T
performance comparison of all three decision tree models:
Out[153]:
Accuracy Recall Precision F1
Full_Tree_Data_Training 0.963803 0.999284 0.904242 0.949390
Pre_Prunned_Data_Training 0.747279 0.981373 0.575052 0.725176
Post_Prunned_Data_Training 0.962829 0.999079 0.902051 0.948089
Full_Tree_Data_Test 0.761450 0.678979 0.645008 0.661557
Pre_Prunned_Data_Test 0.717373 0.931829 0.552447 0.693652
Post_Prunned_Data_Test 0.761450 0.678979 0.645008 0.661557
Observations:
  • Overal, the performance measures on training data are better than the test data.
  • Performance measures are almost the same when Full Decision Tree (model_1) and Post_Prunned decision tree model (model_3) are used.
  • Precision measure is at highest when Full Decision Tree (model_1) or Post_Prunned decision tree model (model_2) is used.
  • Recull and f1 measures are at highest levels when Pre_Prunned decision tree model (model_2) is used.

Now, we use Post_Prunned decision tree model (model_3) and move towards the prediction part.

In [154]:
# predictions on the test set
pred = model_3.predict(x_test9)

df_pred_test = pd.DataFrame({"Actual": y_test2['booking_status'], "Predicted": pred})
df_pred_test.head(10)
Out[154]:
Actual Predicted
50934 0 0
42266 0 0
34994 0 0
14547 1 1
32158 0 0
53837 0 0
14845 0 1
5656 1 1
35144 1 1
15586 1 1

We can also visualize and compare the actual values and the predicted values as bar graph below:

In [155]:
df1 = df_pred_test.head(25)
df1.plot(kind="bar", figsize=(15, 7));

Performance comparison of the Chosen Decision Tree Model and the Chosen Logistic Regression Model:

In [156]:
print("\nperformance measures for the chosen regression model:")
LG=models_train_comp_df[['Lg model on training set- Threshold= 0.42','Lg model on test set- Threshold= 0.42']].T
LG
performance measures for the chosen regression model:
Out[156]:
Accuracy Recall Precision F1
Lg model on training set- Threshold= 0.42 0.785806 0.682325 0.685694 0.684005
Lg model on test set- Threshold= 0.42 0.785250 0.678979 0.690471 0.684676
In [157]:
print("\nperformance measures of the chosen decision tree model:")
DT=DT_models_train_comp_df[['Post_Prunned_Data_Training','Post_Prunned_Data_Test']].T
DT
performance measures of the chosen decision tree model:
Out[157]:
Accuracy Recall Precision F1
Post_Prunned_Data_Training 0.962829 0.999079 0.902051 0.948089
Post_Prunned_Data_Test 0.761450 0.678979 0.645008 0.661557

We plot the performance measure of the regression and the decision tree model as below:

In [158]:
T=pd.concat([LG,DT])
T.T.plot(kind="bar", figsize=(10, 5));
  • Different measures for the logistic regression model with threshold 0.42 and the post_prunned decision tree are shown in gragh above.
  • The decision tree performs better than the logistic regression model on the training set.
  • On the contrary, performance of the logistic regression model exceeds the decision tree when dealing with the test data.

Conclusions

  • We analyzed the "Star Hotels Guests" using two different classification techniques Logistic Regression and Decision Tree classifiers to build a predictive model for room booking cancelations of the hotel.
  • The proposed models can be used to predict if a booking is going to be canceled or not.
  • Compared to the proposed Decision Tree Classifiers our built Logistic Regression Classifiers have the advantage that they provide us with occurrance probability of booking cancelations. This enables us to have more insights toward the predictions.
  • When used different probability thresholds for the output of the Logistic Regression to be able to classify the cancelations while improving model performance measures.
  • The probability thresholds are defined based on investigations on Receiver Operating Characteristic (ROC) and Precision-Recall Curve on both the training and the test sets in effort to improve model performance measures.
  • We can see that threshold=0.33 and threshold=0.42 provide us with good performance measures.
  • Since we are interested to have a balance Recall and Precision, threshold=0.42 serves the best for us.

  • We also have built a Full decision tree, a Pre-Prunned decision tree, and a Post-Prunned decision tree for classifications.

  • We can see that the Full decision tree tends to overfit.
  • The Pre-Prunned tree is obtained by investigating the importance of hyper-parameters and the size of the Post-Prunned tree is controlled by analyzing Cost Complexities.
  • We have compared the purfomance measures of the proposed trees to get a better understanding the models. Easy interpretation is one of the key benefits of Decision Trees.
  • Generally, less data preparation is needed for Decision Trees and such a simple model gave good results even with outliers and imbalanced classes which shows the robustness of Decision Trees.
  • Since we are interested to have a balance Recall and Precision, the proposed Pre-Prunned tree serves the best for us.
  • When using the Pre-Prunned tree, lead_time is the most important variable for predicting booking cancelations. Next to the lead_time, the important variables are no_of_special_requests, arrival_month, no_of_week_nights, and market_segment_type_office.

  • The decision tree performs better than the logistic regression model on the training set. However, performance of the logistic regression model exceeds the decision tree when dealing with the test data.

Business Insights and Recommendations

According to the decision tree model the most important feature influencing on booking cancelation is the lead_time which indicates number of days between the date of booking and the arrival date. The next important features are:

  • no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room, etc)
  • arrival_month: Month of arrival date
  • no_of_week_nights: Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
  • market_segment_type_office: Market segment designation with office type

Based on the tree classification, we can classify the new room bookings and check how they contribute to the posible booking cancelations. For instance:

  • If lead_time <= 4, no_of_special_requests= 1, repeated_guest= 0, and no_of_week_nights >= 11 then there's a very high chance of booking cancelation by the guest.
  • If lead_time <= 5, repeated_guest =0, no_of_special_requests =0, and arrival_month = 1, then there's a very high chance that the booking will not be canceled by the guest.
  • If lead_time <= 2, no_of_special_requests =0, repeated_guest =0, arrival_month >= 2 & <=10, market_segment_type is not Offline, room_type_reserved_Room is Type 6, no_of_week_nights =0, then there's a very high chance that the booking will not be canceled by the guest.

This infomation helps Star Hotels mangers to recognize guests who have higher posibility of cancelations and take appropriate actions. For instance, if the booking does not seem to be promissing then other potential guests can be assigned to it. A waiting list can be prepared for this situations.

The hotel can use these insights to define a proper booking cancelation and refund policy. From the logstic model we learn that when the arrival time or the number of special requests increases, the probability of booking cancelation decreases. When market_segment_type is not office, the possibility of booking cancelation decreases. If new customers do not seem to contribute of opportunity loss (For instance, when market_segment_type is not office), hotel can regualte policies that eases refunding. This can help the bussiness by attracting more customers for the hotel. On the contrary, if new customers may risk the revenue of the hotel and contribute to opportunity loss, hotel can regualte more strict cancalation policies. This can help to reduce the possibility of cancelations by customers.